Difference between revisions of "Part of Speech Tagging"
From Cohen Courses
Jump to navigationJump to searchPastStudents (talk | contribs) |
PastStudents (talk | contribs) |
||
(17 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
== Summary == | == Summary == | ||
− | Part of Speech Tagging (or POS Tagging for short) is a | + | Part of Speech Tagging (or POS Tagging for short) is a [[category::problem]] in the field of computational linguistics which looks at marking each word in a text corpus with the associated word categories known as parts of speech (such as noun, verb, or adjective), based on a word's definition and context of usage. |
+ | |||
+ | POS tagging can be useful as a preprocessing step in tasks like Parsing, and is also useful in tasks like Word Sense Disambiguation and Speech Synthesis. | ||
== Common Approaches == | == Common Approaches == | ||
Some common approaches to POS Tagging include the following: | Some common approaches to POS Tagging include the following: | ||
+ | * '''Hidden Markov Models''' based approaches, sometimes referred to as stochastic algorithms in older literature | ||
+ | * '''Transformation-based learning''' - Brill Tagger | ||
+ | * '''Dynamic Programming''' - Viterbi-like algorithms by DeRose & Church, mentioned for historical reasons | ||
+ | * YFCL | ||
− | * | + | Sources of information/evidence often times used by POS taggers: |
+ | * The distribution of tags for the word isolation: P(t|w) | ||
+ | * "Syntagmatic information"- some POS sequences are much more common than others due to syntactic constraints of the language | ||
− | == | + | == Example Systems == |
+ | * [http://www.markwatson.com/opensource/ FastTag] - open source implementation of Brill Tagger | ||
+ | * [http://nlp.stanford.edu/software/tagger.shtml Stanford Log-linear Part-of-Speech Tagger] | ||
+ | * [http://opennlp.sourceforge.net/ OpenNLP Tagger] - based on maximum entropy | ||
+ | * [http://crftagger.sourceforge.net/ CRF Tagger] - based on conditional random fields | ||
+ | * [http://alias-i.com/lingpipe/ LingPipe] - tool kit that contains models for POS tagging | ||
− | + | == References / Links == | |
+ | * Webpage with links to many different POS tagger systems, from Statistical natural language processing and corpus-based computational linguistics: An annotated list of resources - [http://www-nlp.stanford.edu/links/statnlp.html#Taggers] | ||
+ | * Wikipedia article on Part of Speech Tagging - [http://en.wikipedia.org/wiki/Part-of-speech_tagging] | ||
+ | * CMU Algorithms for NLP notes on POS Tagging - [http://www.cs.cmu.edu/afs/cs.cmu.edu/project/cmt-55/lti/Courses/711/Class-notes/POS-tagging.pdf] | ||
− | == | + | == Relevant Papers == |
− | |||
− | + | {{#ask: [[AddressesProblem::POS Tagging]] | |
− | + | | ?UsesMethod | |
+ | | ?UsesDataset | ||
+ | }} |
Latest revision as of 02:32, 23 November 2010
Summary
Part of Speech Tagging (or POS Tagging for short) is a problem in the field of computational linguistics which looks at marking each word in a text corpus with the associated word categories known as parts of speech (such as noun, verb, or adjective), based on a word's definition and context of usage.
POS tagging can be useful as a preprocessing step in tasks like Parsing, and is also useful in tasks like Word Sense Disambiguation and Speech Synthesis.
Common Approaches
Some common approaches to POS Tagging include the following:
- Hidden Markov Models based approaches, sometimes referred to as stochastic algorithms in older literature
- Transformation-based learning - Brill Tagger
- Dynamic Programming - Viterbi-like algorithms by DeRose & Church, mentioned for historical reasons
- YFCL
Sources of information/evidence often times used by POS taggers:
- The distribution of tags for the word isolation: P(t|w)
- "Syntagmatic information"- some POS sequences are much more common than others due to syntactic constraints of the language
Example Systems
- FastTag - open source implementation of Brill Tagger
- Stanford Log-linear Part-of-Speech Tagger
- OpenNLP Tagger - based on maximum entropy
- CRF Tagger - based on conditional random fields
- LingPipe - tool kit that contains models for POS tagging
References / Links
- Webpage with links to many different POS tagger systems, from Statistical natural language processing and corpus-based computational linguistics: An annotated list of resources - [1]
- Wikipedia article on Part of Speech Tagging - [2]
- CMU Algorithms for NLP notes on POS Tagging - [3]