Difference between revisions of "Lafferty 2001 Conditional Random Fields"
PastStudents (talk | contribs) |
PastStudents (talk | contribs) |
||
Line 10: | Line 10: | ||
== Summary == | == Summary == | ||
− | This [[Category::paper]] introduces [[UsesMethod::Conditional Random Fields]] as sequential classification model. | + | This [[Category::paper]] introduces [[UsesMethod::Conditional Random Fields]] as sequential classification model. They improve over HMMs and MeMMs in the follwing ways: |
− | * | + | * Unlike HMMS, they allow the addition of arbitrary features of the observations. These features can be at any level of the observations, and can also be overlapping. |
− | * | + | * Unlike HMMs, they are optimized according to the conditional likelihood of the state sequence given the observation sequence, instead of according to their joint likelihood, as HMMs are. Consequently, they are able to achieve greater accuracy for many NLP sequential labeling tasks. |
− | + | *Unlike MeMMs, they are normalized globally instead of locally over the input observation sequence. This helps them avoid the label bias problem in MeMMs, where the transitions going out from a state compete only against each other, as opposed to all the other transitions in the model. | |
− | + | The key points of Conditional Random Fields, as introduced by the paper are: | |
− | * | + | * There are no directed arcs from observations to states or vice versa, rather the model is a type of undirected Markov Random Field. |
− | * | + | * Flow going out of the states need not be normalized and sum to 1. |
− | * | + | * The structure of the model is generative, as HMMs, but the optimization criterion is conditional likelihood, as in MeMMs. |
− | MeMMs | + | * Unlike MeMMs, inference is required during training. This makes training substantially slower than for MeMMs. |
+ | |||
+ | * For testing however, inference can be efficently done by using a modified Viterbi algorithm, just as in HMMs and MeMMs. | ||
+ | |||
+ | * Training is performed using Improved Iterative Scaling. | ||
+ | |||
+ | CRFs are considered state of the art for sequential labeling tasks in NLP. After this paper, subsequent research has introduced faster ways of training CRFs such as L-BFGS, Stochastic Gradient Descent, etc. | ||
== Related papers == | == Related papers == | ||
+ | |||
+ | The [[RelatedPaper::Sha 2003 shallow parsing with conditional random fields]] uses a simpler version of CRFs called linear-chain CRFs ,that model the states as being a chain, to perform NP (Noun Phrase) chunking. |
Revision as of 19:22, 26 September 2010
Citation
John Lafferty, Fernando Pereira, and Andrew McCallum. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In ICML.
Online version
An online version of this paper is available [1].
Summary
This paper introduces Conditional Random Fields as sequential classification model. They improve over HMMs and MeMMs in the follwing ways:
- Unlike HMMS, they allow the addition of arbitrary features of the observations. These features can be at any level of the observations, and can also be overlapping.
- Unlike HMMs, they are optimized according to the conditional likelihood of the state sequence given the observation sequence, instead of according to their joint likelihood, as HMMs are. Consequently, they are able to achieve greater accuracy for many NLP sequential labeling tasks.
- Unlike MeMMs, they are normalized globally instead of locally over the input observation sequence. This helps them avoid the label bias problem in MeMMs, where the transitions going out from a state compete only against each other, as opposed to all the other transitions in the model.
The key points of Conditional Random Fields, as introduced by the paper are:
- There are no directed arcs from observations to states or vice versa, rather the model is a type of undirected Markov Random Field.
- Flow going out of the states need not be normalized and sum to 1.
- The structure of the model is generative, as HMMs, but the optimization criterion is conditional likelihood, as in MeMMs.
- Unlike MeMMs, inference is required during training. This makes training substantially slower than for MeMMs.
- For testing however, inference can be efficently done by using a modified Viterbi algorithm, just as in HMMs and MeMMs.
- Training is performed using Improved Iterative Scaling.
CRFs are considered state of the art for sequential labeling tasks in NLP. After this paper, subsequent research has introduced faster ways of training CRFs such as L-BFGS, Stochastic Gradient Descent, etc.
Related papers
The Sha 2003 shallow parsing with conditional random fields uses a simpler version of CRFs called linear-chain CRFs ,that model the states as being a chain, to perform NP (Noun Phrase) chunking.