Difference between revisions of "Sutton McCullum ICML 2007: Piecewise pseudolikelihood for efficient CRF training"

Revision as of 23:45, 2 November 2011

Citation

Piecewise Pseudolikelihood for Efficient Training of Conditional Random Fields. By Charles Sutton, Andrew McCallum. In ICML, vol. {{{volume}}} ({{{issue}}}), 2007.

Online version

This Paper is available here.

Summary

Discriminative training of graphical models is expensive if the cardinality of the variables is large. Generally pseudo-likelihood reduces the cost of inference, but compromises on accuracy. Piecewise training although is accurate, is expensive in a similar way. The authors try to maximize the pseudo-likelihood on the piecewise model.

Definition of Piecewise Pseudo likelihood

For a single instance ${\vec {x}},{\vec {y}}$ ,

${\mathcal {L}}_{\mbox{PWPL}}(\Lambda ,{\vec {x}},{\vec {y}})=\sum _{a}\sum _{s\in a}\ln p_{\mbox{LCL}}(y_{s}|{\vec {y}}_{a-s},{\vec {x}},\lambda _{a})$

where

$p_{\mbox{LCL}}(y_{s}|{\vec {y}}_{a-s},{\vec {x}},\lambda _{a})={\frac {\Psi _{a}(y_{s}|{\vec {y}}_{a-s},{\vec {x}},\lambda _{a})}{Z({\vec {y}}_{a-s},{\vec {x}},\lambda _{a})}}$

Therefore the optimization function is

$O=\sum _{i}{\mathcal {L}}_{\mbox{PWPL}}(\Lambda ,{\vec {x}}^{(i)},{\vec {y}}^{(i)})-\sum _{a}{\frac {\lambda _{a}^{2}}{2\sigma ^{2}}}$

where the second term is the standard gaussian prior to prevent over fitting.

Experimental results

Sequences generated by a 2 order HMM.

POS tagging on Penn Treebank set.

@@ Line 5: / Line 5: @@
 == Online version ==
-http://www.machinelearning.org/proceedings/icml2007/papers/549.pdf
+This [[Category::Paper]] is available [http://www.machinelearning.org/proceedings/icml2007/papers/549.pdf here].
 == Summary ==
-Discriminative training of graphical models is expensive if the cardinality of the variables is large. Generally pseudo-likelihood reduces the cost of inference, but compromises on accuracy. Piecewise training although is accurate, is expensive in a similar way. The authors try to maximize the pseudo-likelihood on the piecewise model. If <math>m</math> represent the maximum number of assignments to a single variable <math>y_s</math> and <math>K</math> represents the size of the largest
+Discriminative training of graphical models is expensive if the cardinality of the variables is large. Generally [[RelatedPaper::Besag 1985 | pseudo-likelihood]] reduces the cost of inference, but compromises on accuracy. [[RelatedPaper::Sutton and McCullum 2005 | Piecewise training]] although is accurate, is expensive in a similar way. The authors try to maximize the pseudo-likelihood on the piecewise model.
+== Definition of Piecewise Pseudo likelihood ==
-== Definition of Piecewise Pseudo likelihood ==
+For a single instance <math>\vec{x}, \vec{y}</math>,
-For a single instance <math>\vec{x}, \vec{y}</math>
 <math>\mathcal{L}_{\mbox{PWPL}} (\Lambda, \vec{x}, \vec{y}) = \sum_a \sum_{s \in a} \ln p_{\mbox{LCL}}(y_s | \vec{y}_{a - s}, \vec{x}, \lambda_a)</math>
@@ Line 33: / Line 34: @@
 * POS tagging on Penn Treebank set.
 == Related Papers ==
-Pseudolikelihood was proposed by Besag (1975) and has been applied in NLP by Toutanova et al. (2003) and others.

Difference between revisions of "Sutton McCullum ICML 2007: Piecewise pseudolikelihood for efficient CRF training"

Revision as of 23:45, 2 November 2011

Contents

Citation

Online version

Summary

Definition of Piecewise Pseudo likelihood

Experimental results

Related Papers

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools