Sutton McCullum ICML 2007: Piecewise pseudolikelihood for efficient CRF training

Citation

Piecewise Pseudolikelihood for Efficient Training of Conditional Random Fields. By Charles Sutton, Andrew McCallum. In ICML, vol. {{{volume}}} ({{{issue}}}), 2007.

Online version

http://www.machinelearning.org/proceedings/icml2007/papers/549.pdf

Summary

Discriminative training of graphical models is expensive if the cardinality of the variables is large. Generally pseudo-likelihood reduces the cost of inference, but compromises on accuracy. Piecewise training although is accurate, is expensive in a similar way. The authors try to maximize the pseudo-likelihood on the piecewise model. If $m$ represent the maximum number of assignments to a single variable $y_{s}$ and $K$ represents the size of the largest

Definition of Piecewise Pseudo likelihood

For a single instance ${\vec {x}},{\vec {y}}$

${\mathcal {L}}_{\mbox{PWPL}}(\Lambda ,{\vec {x}},{\vec {y}})=\sum _{a}\sum _{s\in a}\ln p_{\mbox{LCL}}(y_{s}|{\vec {y}}_{a-s},{\vec {x}},\lambda _{a})$

where

$p_{\mbox{LCL}}(y_{s}|{\vec {y}}_{a-s},{\vec {x}},\lambda _{a})={\frac {\Psi _{a}(y_{s}|{\vec {y}}_{a-s},{\vec {x}},\lambda _{a})}{Z({\vec {y}}_{a-s},{\vec {x}},\lambda _{a})}}$

Therefore the optimization function is

$O=\sum _{i}{\mathcal {L}}_{\mbox{PWPL}}(\Lambda ,{\vec {x}}^{(i)},{\vec {y}}^{(i)})-\sum _{a}{\frac {\lambda _{a}^{2}}{2\sigma ^{2}}}$

where the second term is the standard gaussian prior to prevent over fitting.

Experimental results

Sequences generated by a 2 order HMM.

POS tagging on Penn Treebank set.

Related Papers

Pseudolikelihood was proposed by Besag (1975) and has been applied in NLP by Toutanova et al. (2003) and others.

Sutton McCullum ICML 2007: Piecewise pseudolikelihood for efficient CRF training

Contents

Citation

Online version

Summary

Definition of Piecewise Pseudo likelihood

Experimental results

Related Papers

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools