Difference between revisions of "Part of Speech Tagging"

From Cohen Courses
Jump to navigationJump to search
 
(15 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
== Summary ==
 
== Summary ==
  
Part of Speech Tagging (or POS Tagging for short) is a task in the field of computational linguistics which looks at assigning word categories known as parts of speech to words.
+
Part of Speech Tagging (or POS Tagging for short) is a [[category::problem]] in the field of computational linguistics which looks at marking each word in a text corpus with the associated word categories known as parts of speech (such as noun, verb, or adjective), based on a word's definition and context of usage.
 +
 
 +
POS tagging can be useful as a preprocessing step in tasks like Parsing, and is also useful in tasks like Word Sense Disambiguation and Speech Synthesis.
  
 
== Common Approaches ==
 
== Common Approaches ==
  
 
Some common approaches to POS Tagging include the following:
 
Some common approaches to POS Tagging include the following:
 +
* '''Hidden Markov Models''' based approaches, sometimes referred to as stochastic algorithms in older literature
 +
* '''Transformation-based learning''' - Brill Tagger
 +
* '''Dynamic Programming''' - Viterbi-like algorithms by DeRose & Church, mentioned for historical reasons
 +
* YFCL
  
* ...
+
Sources of information/evidence often times used by POS taggers:
 
+
* The distribution of tags for the word isolation: P(t|w)
== Challenges / Issues ==
+
* "Syntagmatic information"- some POS sequences are much more common than others due to syntactic constraints of the language
 
 
Some major challenges in POS Tagging
 
  
 
== Example Systems ==
 
== Example Systems ==
* ...
+
* [http://www.markwatson.com/opensource/ FastTag] - open source implementation of Brill Tagger
 +
* [http://nlp.stanford.edu/software/tagger.shtml Stanford Log-linear Part-of-Speech Tagger]
 +
* [http://opennlp.sourceforge.net/ OpenNLP Tagger] - based on maximum entropy
 +
* [http://crftagger.sourceforge.net/ CRF Tagger] - based on conditional random fields
 +
* [http://alias-i.com/lingpipe/ LingPipe] - tool kit that contains models for POS tagging
  
 
== References / Links ==
 
== References / Links ==
 +
* Webpage with links to many different POS tagger systems, from Statistical natural language processing and corpus-based computational linguistics: An annotated list of resources - [http://www-nlp.stanford.edu/links/statnlp.html#Taggers]
 
* Wikipedia article on Part of Speech Tagging - [http://en.wikipedia.org/wiki/Part-of-speech_tagging]
 
* Wikipedia article on Part of Speech Tagging - [http://en.wikipedia.org/wiki/Part-of-speech_tagging]
 +
* CMU Algorithms for NLP notes on POS Tagging - [http://www.cs.cmu.edu/afs/cs.cmu.edu/project/cmt-55/lti/Courses/711/Class-notes/POS-tagging.pdf]
 +
 +
== Relevant Papers ==
 +
 +
{{#ask: [[AddressesProblem::POS Tagging]]
 +
| ?UsesMethod
 +
| ?UsesDataset
 +
}}

Latest revision as of 02:32, 23 November 2010

Summary

Part of Speech Tagging (or POS Tagging for short) is a problem in the field of computational linguistics which looks at marking each word in a text corpus with the associated word categories known as parts of speech (such as noun, verb, or adjective), based on a word's definition and context of usage.

POS tagging can be useful as a preprocessing step in tasks like Parsing, and is also useful in tasks like Word Sense Disambiguation and Speech Synthesis.

Common Approaches

Some common approaches to POS Tagging include the following:

  • Hidden Markov Models based approaches, sometimes referred to as stochastic algorithms in older literature
  • Transformation-based learning - Brill Tagger
  • Dynamic Programming - Viterbi-like algorithms by DeRose & Church, mentioned for historical reasons
  • YFCL

Sources of information/evidence often times used by POS taggers:

  • The distribution of tags for the word isolation: P(t|w)
  • "Syntagmatic information"- some POS sequences are much more common than others due to syntactic constraints of the language

Example Systems

References / Links

  • Webpage with links to many different POS tagger systems, from Statistical natural language processing and corpus-based computational linguistics: An annotated list of resources - [1]
  • Wikipedia article on Part of Speech Tagging - [2]
  • CMU Algorithms for NLP notes on POS Tagging - [3]

Relevant Papers