Tagging: The descriptors are called the tags and theautomatic assignment of the descriptors to the given tokens is called tagging.POS TaggingThe process of assigning one of the parts ofspeech to the given word is called Parts Of Speech tagging, commonly referredto as POS tagging. Parts of speech include nouns, verbs, adverbs, adjectives,pronouns, conjunction and their sub-categoriesPOS TaggerA Part-Of-SpeechTagger (POS Tagger) is a software that reads text and then assigns parts ofspeech to each word (and other token), such as noun, verb, adjective, etc., Ituses different kinds of information such as dictionary, lexicons, rules, etc.because dictionarieshave category or categories of a particular word, that is a word may belong tomore than one category.
For example, run is both noun and verb so to solve thisambiguity taggers use probabilistic information.There are mainlytwo type of taggers: Rule-based – Useshand-written rules to distinguish the tag ambiguity. Stochastictaggers are either HMM based – chooses the tag sequence which maximizes theproduct of word likelihood and tag sequence probability, or cue-based, usingdecision trees or maximum entropy models to combine probabilistic features.TagsetTagger choosesthe relevant tags to attach with the words from set of tags called tagset.Every tagger willbe given a standard tagset.
The tagset may be coarse such as N (Noun), V(Verb),ADJ(Adjective), ADV(Adverb), PREP(Preposition), CONJ(Conjunction) orfine-grained such as NNOM(Noun-Nominative), NSOC(Noun-Sociative), VFIN(VerbFinite),VNFIN(Verb Nonfinite) and so on. Most of the taggers use only finegrained tagset.Example of an English(Treebank)tags are shown below