Difference between revisions of "Sutton McCullum ICML 2007: Piecewise pseudolikelihood for efficient CRF training"

From Cohen Courses
Jump to navigationJump to search
Line 5: Line 5:
 
== Online version ==  
 
== Online version ==  
  
http://www.machinelearning.org/proceedings/icml2007/papers/549.pdf
+
This [[Category::Paper]] is available [http://www.machinelearning.org/proceedings/icml2007/papers/549.pdf here].
  
 
== Summary ==  
 
== Summary ==  
  
Discriminative training of graphical models is expensive if the cardinality of the variables is large. Generally pseudo-likelihood reduces the cost of inference, but compromises on accuracy. Piecewise training although is accurate, is expensive in a similar way. The authors try to maximize the pseudo-likelihood on the piecewise model. If <math>m</math> represent the maximum number of assignments to a single variable <math>y_s</math> and <math>K</math> represents the size of the largest
+
Discriminative training of graphical models is expensive if the cardinality of the variables is large. Generally [[RelatedPaper::Besag 1985 | pseudo-likelihood]] reduces the cost of inference, but compromises on accuracy. [[RelatedPaper::Sutton and McCullum 2005 | Piecewise training]] although is accurate, is expensive in a similar way. The authors try to maximize the pseudo-likelihood on the piecewise model.
 +
 +
== Definition of Piecewise Pseudo likelihood ==
  
== Definition of Piecewise Pseudo likelihood ==
+
For a single instance <math>\vec{x}, \vec{y}</math>,
  
For a single instance <math>\vec{x}, \vec{y}</math>
 
  
 
<math>\mathcal{L}_{\mbox{PWPL}} (\Lambda, \vec{x}, \vec{y}) = \sum_a \sum_{s \in a} \ln p_{\mbox{LCL}}(y_s | \vec{y}_{a - s}, \vec{x}, \lambda_a)</math>
 
<math>\mathcal{L}_{\mbox{PWPL}} (\Lambda, \vec{x}, \vec{y}) = \sum_a \sum_{s \in a} \ln p_{\mbox{LCL}}(y_s | \vec{y}_{a - s}, \vec{x}, \lambda_a)</math>
Line 33: Line 34:
 
* POS tagging on Penn Treebank set.
 
* POS tagging on Penn Treebank set.
  
== Related Papers ==  
+
== Related Papers ==
 
 
Pseudolikelihood was proposed by Besag (1975) and has been applied in NLP by Toutanova et al. (2003) and others.
 

Revision as of 23:45, 2 November 2011

Citation

Piecewise Pseudolikelihood for Efficient Training of Conditional Random Fields. By Charles Sutton, Andrew McCallum. In ICML, vol. {{{volume}}} ({{{issue}}}), 2007.

Online version

This Paper is available here.

Summary

Discriminative training of graphical models is expensive if the cardinality of the variables is large. Generally pseudo-likelihood reduces the cost of inference, but compromises on accuracy. Piecewise training although is accurate, is expensive in a similar way. The authors try to maximize the pseudo-likelihood on the piecewise model.

Definition of Piecewise Pseudo likelihood

For a single instance ,


where

Therefore the optimization function is

where the second term is the standard gaussian prior to prevent over fitting.

Experimental results

  • Sequences generated by a 2 order HMM.
  • POS tagging on Penn Treebank set.

Related Papers