Difference between revisions of "Ratnaparkhi EMNLP 1996"

From Cohen Courses
Jump to navigationJump to search
(Created page with '== Citation == Ratnaparkhi, A. A maximum entropy model for part-of-speech tagging. In Proceedings of the Conference on Empirical Methods in Natural Language Processing EMNLP-96…')
 
 
(One intermediate revision by one other user not shown)
Line 9: Line 9:
 
== Summary ==  
 
== Summary ==  
  
This paper introduced a [[Category::method]] which applied [[UsesMethod::maximum entropy]] models to the task of [[AddressesProblem::part of speech tagging]]. The maximum entropy idea is explored further in future [[McCallum et al ICML 2000|papers]] using stricter Markov model frameworks, but here chiefly allowed for incorporation of very expressive features and flexible usage of context as input features in the model. Key ideas included
+
This [[Category::paper]] introduced a method which applied [[UsesMethod::logistic regression|maximum entropy]] models to the task of [[AddressesProblem::part of speech tagging]]. The maximum entropy idea is explored further in future [[McCallum et al ICML 2000|papers]] using stricter Markov model frameworks, but here chiefly allowed for incorporation of very expressive features and flexible usage of context as input features in the model. Key ideas included
  
 
* Maximum entropy avoids imposing distributional assumptions, other than constraining the feature expected values to match the sample average.
 
* Maximum entropy avoids imposing distributional assumptions, other than constraining the feature expected values to match the sample average.
Line 20: Line 20:
  
 
The maxent POS tagger performed comparably to other state of the art taggers. However, compared to HMMs, it allowed for more diverse features. Compared to decision trees, it did not require word classes to avoid data fragmentation. Compared to rule based systems, it outputs probabilities that can be used by programs later in the NLP pipeline.
 
The maxent POS tagger performed comparably to other state of the art taggers. However, compared to HMMs, it allowed for more diverse features. Compared to decision trees, it did not require word classes to avoid data fragmentation. Compared to rule based systems, it outputs probabilities that can be used by programs later in the NLP pipeline.
 
 
  
 
== Related Papers ==
 
== Related Papers ==

Latest revision as of 11:50, 27 October 2010

Citation

Ratnaparkhi, A. A maximum entropy model for part-of-speech tagging. In Proceedings of the Conference on Empirical Methods in Natural Language Processing EMNLP-96, (1996).

Online Version

http://acl.ldc.upenn.edu/W/W96/W96-0213.pdf

Summary

This paper introduced a method which applied maximum entropy models to the task of part of speech tagging. The maximum entropy idea is explored further in future papers using stricter Markov model frameworks, but here chiefly allowed for incorporation of very expressive features and flexible usage of context as input features in the model. Key ideas included

  • Maximum entropy avoids imposing distributional assumptions, other than constraining the feature expected values to match the sample average.
  • Rich, overlapping features (ie do not need to be independent). Features are combinations of the tags and the word history.
  • Employs context to improve POS tagging, including using previous tag(s) as input features.
  • Introduces beam search, rather than using typical HMM style dynamic programming.
  • Employs a tag dictionary to filter known incorrect tags for specific common words. This is more for speed than accuracy.
  • Specialized word features that target frequently mis-tagged words.
  • Unseen words during test time can be modeled similar to rare words during training.

The maxent POS tagger performed comparably to other state of the art taggers. However, compared to HMMs, it allowed for more diverse features. Compared to decision trees, it did not require word classes to avoid data fragmentation. Compared to rule based systems, it outputs probabilities that can be used by programs later in the NLP pipeline.

Related Papers

  • McCallum et al ICML 2000 apply maxent models similarly, but their Markov model allows for more flexible transition structures, helping to avoid data sparsity.
  • Brants ANLP 2000 argues that HMMs are superior to maxent models for POS tagging.