Difference between revisions of "Frietag 2000 Maximum Entropy Markov Models for Information Extraction and Segmentation"

From Cohen Courses
Jump to navigationJump to search
Line 1: Line 1:
== Maximum Entropy Markov Models for Information Extraction and Segmentation ==
 
 
 
== Citation ==
 
== Citation ==
  
Line 10: Line 8:
  
 
== Summary ==
 
== Summary ==
 +
This paper introduces Maximum Entropy Markov Models as sequential classification model that improves over Hidden Markov Models (HMMs) in the following ways:
  
 +
* They allow the addition of arbitrary features of the observations. These features can be at any level of the observations, and can also be overlapping.
 +
 +
* They are optimized according to the conditional likelihood of the state sequence given the observation sequence, instead of according to their joint likelihood, as HMMs are. Consequently, they are able to achieve greater accuracy for many NLP sequential labeling tasks.
 +
 +
The key points of the Maximum Entropy Markov Models introduced by the paper are:
 +
 +
* Instead of observations being dependent on states, it is the other way round. Also, the most accurate way to look at them is as observations being conditioned on the transitions rather than the states themselves.
 +
 +
* Each such transition is modeled by a Maxent classifier
 +
 +
* Inference can be efficently done by using a modified Viterbi algorithm, just as in HMMs
 +
 +
* Training is performed using Generalized Iterative Scaling.
 +
 +
MeMMs were considered state-of-the-art for certain labeling tasks such as NER until the inroduction of Conditional Random Fields.
 
== Related papers ==
 
== Related papers ==
 +
 +
The original CRF ([[RelatedPaper::Lafferty et al ICML 2001]]) papers, published just one year after this paper, introduces an even more expressive model, and CRFs are now considered state-of-the-art for sequential labeling tasks. For a background on exponential models and maximum entropy, the [[RelatedPaper::Berger et al CL 1996]] paper that introduced the maximum entropy approach to NLP  is suggested.

Revision as of 19:59, 26 September 2010

Citation

McCallum, A. and Freitag, D. and Pereira, F. 2000. Maximum Entropy Markov Models for Information Extraction and Segmentation. In Proceedings of the Seventeenth International Conference on Machine Learning. 591--598

Online version

An online version of this paper is available [1].

Summary

This paper introduces Maximum Entropy Markov Models as sequential classification model that improves over Hidden Markov Models (HMMs) in the following ways:

  • They allow the addition of arbitrary features of the observations. These features can be at any level of the observations, and can also be overlapping.
  • They are optimized according to the conditional likelihood of the state sequence given the observation sequence, instead of according to their joint likelihood, as HMMs are. Consequently, they are able to achieve greater accuracy for many NLP sequential labeling tasks.

The key points of the Maximum Entropy Markov Models introduced by the paper are:

  • Instead of observations being dependent on states, it is the other way round. Also, the most accurate way to look at them is as observations being conditioned on the transitions rather than the states themselves.
  • Each such transition is modeled by a Maxent classifier
  • Inference can be efficently done by using a modified Viterbi algorithm, just as in HMMs
  • Training is performed using Generalized Iterative Scaling.

MeMMs were considered state-of-the-art for certain labeling tasks such as NER until the inroduction of Conditional Random Fields.

Related papers

The original CRF (Lafferty et al ICML 2001) papers, published just one year after this paper, introduces an even more expressive model, and CRFs are now considered state-of-the-art for sequential labeling tasks. For a background on exponential models and maximum entropy, the Berger et al CL 1996 paper that introduced the maximum entropy approach to NLP is suggested.