Lafferty 2001 Conditional Random Fields

From Cohen Courses
Revision as of 20:22, 26 September 2010 by PastStudents (talk | contribs)
Jump to navigationJump to search

Citation

John Lafferty, Fernando Pereira, and Andrew McCallum. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In ICML.

Online version

An online version of this paper is available [1].

Summary

This paper introduces Conditional Random Fields as sequential classification model. They improve over HMMs and MeMMs in the follwing ways:

  • Unlike HMMS, they allow the addition of arbitrary features of the observations. These features can be at any level of the observations, and can also be overlapping.
  • Unlike HMMs, they are optimized according to the conditional likelihood of the state sequence given the observation sequence, instead of according to their joint likelihood, as HMMs are. Consequently, they are able to achieve greater accuracy for many NLP sequential labeling tasks.
  • Unlike MeMMs, they are normalized globally instead of locally over the input observation sequence. This helps them avoid the label bias problem in MeMMs, where the transitions going out from a state compete only against each other, as opposed to all the other transitions in the model.

The key points of Conditional Random Fields, as introduced by the paper are:

  • There are no directed arcs from observations to states or vice versa, rather the model is a type of undirected Markov Random Field.
  • Flow going out of the states need not be normalized and sum to 1.
  • The structure of the model is generative, as HMMs, but the optimization criterion is conditional likelihood, as in MeMMs.
  • Unlike MeMMs, inference is required during training. This makes training substantially slower than for MeMMs.
  • For testing however, inference can be efficently done by using a modified Viterbi algorithm, just as in HMMs and MeMMs.
  • Training is performed using Improved Iterative Scaling.

CRFs are considered state of the art for sequential labeling tasks in NLP. After this paper, subsequent research has introduced faster ways of training CRFs such as L-BFGS, Stochastic Gradient Descent, etc.

Related papers

The Sha 2003 shallow parsing with conditional random fields uses a simpler version of CRFs called linear-chain CRFs ,that model the states as being a chain, to perform NP (Noun Phrase) chunking.