Difference between revisions of "Roth and Yih, ICML 2005. Integer Linear Programming Inference for Conditional Random Fields"

Revision as of 19:05, 30 October 2011

Citation

Dan Roth and Wen-tau Yih. 2005. Integer Linear Programming Inference for Conditional Random Fields. In Proceedings of the 22^nd International Conference on Machine learning, ICML'05, New York, NY, USA.

Online Version

Online version

Summary

This paper presents an alternative approach to inference in conditional random fields using integer linear programming (ILP). The standard Viterbi algorithm based on dynamic programming, in general, cannot efficiently incorporate non-local and non-sequential constraints over the output sequence. The authors propose an ILP-based method to inference procedure. and extend CRF models to naturally and efficiently support general constraint structures. For sequential constraints, this procedure reduces to simple linear programming as the inference process.

Method

The efficiency of the CRF approach heavily depends on its Markov property – given the observation, the label of a token is assumed to depend only on the labels of its adjacent tokens. It is difficult to explicitly model and exploit more general constraints such as long distance dependencies keeping in mind the Markovian assumption. This is a problem not only for training CRFs but also for incorporating additional constraints during inference. Viterbi algorithm can handle some of these constraints, but cannot handle more generic constraints. This method is able to take into account these constraints more efficiently.

Conditional Random Field

A standard CRF model is represented by:

$P_{\lambda }(y|x)={\frac {\exp(\lambda \cdot F(y,x)}{Z_{\lambda }(x)}}$

where $F(y,x)$ is the global feature vector, $\lambda$ is weight vector, and $Z_{\lambda }(x)=\sum (\lambda \cdot F(y,x))$ is a normalization factor.

A CRF is trained by maximizing the conditional log-likelihood of a given training set $T=\{(x_{k},y_{k})\}_{k=1}^{N}$ :

$L_{\lambda }=\sum _{k}\log(p_{\lambda }(y_{k}|x_{k}))=\sum _{k}[\lambda \cdot F(y_{k},x_{k})-\log Z_{\lambda }(x_{k})]$

Inference Using ILP

Inference in CRFs is usually performed by the Viterbi algorithm which takes into account the shortest path from start to the end. The weights on the edges of the path determine the global path of the cost. This weight or log score is obtained from a linear combination of feature functions and the weight vector.

This approach is represented in ILP as follows:

Given a directed graph $G=(V,E)$ , two distinct nodes $s,t\in V$ , and a non-negative cost $c_{uv}$ of each edge $(u,v)\in E$ , a minimum cost path from $s$ to $t$ is desired. For each edge $(u,v)\in E$ , an indicator variable $x_{uv}$ is introduced. If $(u,v)$ is in the minimum cost (shortest) path, then $x_{uv}$ is set to 1; otherwise, it is 0. The cost function is, $P(u,v)\in Ec_{uv}\cdot x_{uv}$ .

The following figure depicts ILP-based representation of CRFs subject to a set of constraints:

@@ Line 34: / Line 34: @@
 === Inference Using ILP ===
-Inference in CRFs is usually performed by the Viterbi algorithm which takes into account the shortest path from start to the end. The weights on the edges of the path determine the global path of the cost. This weight or ''score'' is obtained from a linear combination of feature functions and the weight vector.
+Inference in CRFs is usually performed by the Viterbi algorithm which takes into account the shortest path from start to the end. The weights on the edges of the path determine the global path of the cost. This weight or ''log score'' is obtained from a linear combination of feature functions and the weight vector.
 This approach is represented in ILP as follows:
-Given a directed graph <math> G = (V,E) </math>, two distinct nodes <math> s, t \in V </math>, and a non-negative cost <math> c_{uv} </math> of each edge <math> (u,v) \in E </math>, a minimum ''cost'' path from <math> s </math> to <math> t </math> is desired.
+Given a directed graph <math> G = (V,E) </math>, two distinct nodes <math> s, t \in V </math>, and a non-negative cost <math> c_{uv} </math> of each edge <math> (u,v) \in E </math>, a minimum ''cost'' path from <math> s </math> to <math> t </math> is desired. For each edge <math> (u,v) \in E </math>, an indicator variable <math> x_{uv} </math> is introduced. If <math> (u,v) </math> is in the minimum cost (shortest) path, then <math> x_{uv} </math> is set to 1; otherwise, it is 0. The cost function is, <math> P(u,v) \in E c_{uv} \cdot x_{uv} </math> .
+The following figure depicts ILP-based representation of CRFs subject to a set of constraints:
+[[File:CRF-ILP.jpg]]
 == Experiments and Results ==
 == Related Papers ==

Difference between revisions of "Roth and Yih, ICML 2005. Integer Linear Programming Inference for Conditional Random Fields"

Revision as of 19:05, 30 October 2011

Contents

Citation

Online Version

Summary

Method

Conditional Random Field

Inference Using ILP

Experiments and Results

Related Papers

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools