Conditional Random Fields
This is a method discussed in Information Extraction 10-707 in Fall 2010.
Conditional random field (CRF) is a type of discriminative probabilistic model most often used for the labeling or parsing of sequential data, such as natural language text or biological sequences. With a foundation from Maximum Entropy model and Hidden Markov model, it outperforms them in particular on the tasks of Shallow Parsing, Named Entity Recognition and Visual Object Recognition etc.
Much like a Markov random field, a CRF is an undirected graphical model in which each vertex represents a random variable whose distribution is to be inferred, and each edge represents a dependency between two random variables. Assume input sequence X represents a sequence of observations and Y represents a hidden (or unknown) state variable that needs to be inferred given the observations. In a CRF, the distribution of each discrete random variable Y in the graph is conditioned on an input sequence X.
In principle, the layout of the graph of random variables Y can be arbitrary; most often, however, a layout of linear chain CRF is used, this layout admits efficient algorithms for model training, learning the conditional distributions between the Y and feature functions from some corpus of training data, inference, determining the probability of a given label sequence Y given X, and decoding, determining the most likely label sequence Y given X.
Linear-chain Conditional Random Fields
Linear-chain CRFs define conditional probability distributions p(Y|X) of label sequences given input sequences. The label and input sequences are assumed to have the same length.
A CRF on (X, Y) is specified by a local feature vector and a weight vector, the local features are defined as follows:
And the global feature vector is thus an aggregate of the local features:
The conditional probability distribution defined by CRF is then defined as
In sequential labeling task, we would like to find the most probable label sequence for input sequence, thus we use the following formula
The decoding process can be done with the Viterbi algorithm.