Difference between revisions of "Lafferty 2001 Conditional Random Fields"

From Cohen Courses
Jump to navigationJump to search
Line 10: Line 10:
 
== Summary ==
 
== Summary ==
  
This [[Category::paper]] introduces [[UsesMethod::Conditional Random Fields]] as sequential classification model.
+
This [[Category::paper]] introduces [[UsesMethod::Conditional Random Fields]] as sequential classification model. They improve over HMMs and MeMMs in the follwing ways:
  
* They allow the addition of arbitrary features of the observations. These features can be at any level of the observations, and can also be overlapping.
+
* Unlike HMMS, they allow the addition of arbitrary features of the observations. These features can be at any level of the observations, and can also be overlapping.
  
* They are optimized according to the conditional likelihood of the state sequence given the observation sequence, instead of according to their joint likelihood, as HMMs are. Consequently, they are able to achieve greater accuracy for many NLP sequential labeling tasks.
+
* Unlike HMMs, they are optimized according to the conditional likelihood of the state sequence given the observation sequence, instead of according to their joint likelihood, as HMMs are. Consequently, they are able to achieve greater accuracy for many NLP sequential labeling tasks.
  
The key points of the Maximum Entropy Markov Models introduced by the paper are:
+
*Unlike MeMMs, they are normalized globally instead of locally over the input observation sequence. This helps them avoid the label bias problem in MeMMs, where the transitions going out from a state compete only against each other, as opposed to all the other transitions in the model.
  
* Instead of observations being dependent on states, it is the other way round. Also, the most accurate way to look at them is as observations being conditioned on the transitions rather than the states themselves.
+
The key points of Conditional Random Fields, as introduced by the paper are:
  
* Each such transition is modeled by a Maxent classifier
+
* There are no directed arcs from observations to states or vice versa, rather the model is a type of undirected Markov Random Field.
  
* Inference can be efficently done by using a modified Viterbi algorithm, just as in HMMs
+
* Flow going out of the states need not be normalized and sum to 1.
  
* Training is performed using Generalized Iterative Scaling.
+
* The structure of the model is generative, as HMMs, but the optimization criterion is conditional likelihood, as in MeMMs.
  
MeMMs were considered state-of-the-art for certain labeling tasks such as NER until the inroduction of Conditional Random Fields.
+
* Unlike MeMMs, inference is required during training. This makes training substantially slower than for MeMMs.
 +
 
 +
* For testing however, inference can be efficently done by using a modified Viterbi algorithm, just as in HMMs and MeMMs.
 +
 
 +
* Training is performed using Improved Iterative Scaling.
 +
 
 +
CRFs are considered state of the art for sequential labeling tasks in NLP. After this paper, subsequent research has introduced faster ways of training CRFs such as L-BFGS, Stochastic Gradient Descent, etc.
  
 
== Related papers ==
 
== Related papers ==
 +
 +
The [[RelatedPaper::Sha 2003 shallow parsing with conditional random fields]] uses a simpler version of CRFs called linear-chain CRFs ,that model the states as being a chain, to perform NP (Noun Phrase) chunking.

Revision as of 20:22, 26 September 2010

Citation

John Lafferty, Fernando Pereira, and Andrew McCallum. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In ICML.

Online version

An online version of this paper is available [1].

Summary

This paper introduces Conditional Random Fields as sequential classification model. They improve over HMMs and MeMMs in the follwing ways:

  • Unlike HMMS, they allow the addition of arbitrary features of the observations. These features can be at any level of the observations, and can also be overlapping.
  • Unlike HMMs, they are optimized according to the conditional likelihood of the state sequence given the observation sequence, instead of according to their joint likelihood, as HMMs are. Consequently, they are able to achieve greater accuracy for many NLP sequential labeling tasks.
  • Unlike MeMMs, they are normalized globally instead of locally over the input observation sequence. This helps them avoid the label bias problem in MeMMs, where the transitions going out from a state compete only against each other, as opposed to all the other transitions in the model.

The key points of Conditional Random Fields, as introduced by the paper are:

  • There are no directed arcs from observations to states or vice versa, rather the model is a type of undirected Markov Random Field.
  • Flow going out of the states need not be normalized and sum to 1.
  • The structure of the model is generative, as HMMs, but the optimization criterion is conditional likelihood, as in MeMMs.
  • Unlike MeMMs, inference is required during training. This makes training substantially slower than for MeMMs.
  • For testing however, inference can be efficently done by using a modified Viterbi algorithm, just as in HMMs and MeMMs.
  • Training is performed using Improved Iterative Scaling.

CRFs are considered state of the art for sequential labeling tasks in NLP. After this paper, subsequent research has introduced faster ways of training CRFs such as L-BFGS, Stochastic Gradient Descent, etc.

Related papers

The Sha 2003 shallow parsing with conditional random fields uses a simpler version of CRFs called linear-chain CRFs ,that model the states as being a chain, to perform NP (Noun Phrase) chunking.