Difference between revisions of "Sutton McCullum ICML 2007: Piecewise pseudolikelihood for efficient CRF training"
From Cohen Courses
Jump to navigationJump to searchLine 5: | Line 5: | ||
== Online version == | == Online version == | ||
− | http://www.machinelearning.org/proceedings/icml2007/papers/549.pdf | + | This [[Category::Paper]] is available [http://www.machinelearning.org/proceedings/icml2007/papers/549.pdf here]. |
== Summary == | == Summary == | ||
− | Discriminative training of graphical models is expensive if the cardinality of the variables is large. Generally pseudo-likelihood reduces the cost of inference, but compromises on accuracy. Piecewise training although is accurate, is expensive in a similar way. The authors try to maximize the pseudo-likelihood on the piecewise model. | + | Discriminative training of graphical models is expensive if the cardinality of the variables is large. Generally [[RelatedPaper::Besag 1985 | pseudo-likelihood]] reduces the cost of inference, but compromises on accuracy. [[RelatedPaper::Sutton and McCullum 2005 | Piecewise training]] although is accurate, is expensive in a similar way. The authors try to maximize the pseudo-likelihood on the piecewise model. |
+ | |||
+ | == Definition of Piecewise Pseudo likelihood == | ||
− | + | For a single instance <math>\vec{x}, \vec{y}</math>, | |
− | |||
<math>\mathcal{L}_{\mbox{PWPL}} (\Lambda, \vec{x}, \vec{y}) = \sum_a \sum_{s \in a} \ln p_{\mbox{LCL}}(y_s | \vec{y}_{a - s}, \vec{x}, \lambda_a)</math> | <math>\mathcal{L}_{\mbox{PWPL}} (\Lambda, \vec{x}, \vec{y}) = \sum_a \sum_{s \in a} \ln p_{\mbox{LCL}}(y_s | \vec{y}_{a - s}, \vec{x}, \lambda_a)</math> | ||
Line 33: | Line 34: | ||
* POS tagging on Penn Treebank set. | * POS tagging on Penn Treebank set. | ||
− | == Related Papers == | + | == Related Papers == |
− | |||
− |
Revision as of 23:45, 2 November 2011
Contents
Citation
Piecewise Pseudolikelihood for Efficient Training of Conditional Random Fields. By Charles Sutton, Andrew McCallum. In ICML, vol. {{{volume}}} ({{{issue}}}), 2007.
Online version
Summary
Discriminative training of graphical models is expensive if the cardinality of the variables is large. Generally pseudo-likelihood reduces the cost of inference, but compromises on accuracy. Piecewise training although is accurate, is expensive in a similar way. The authors try to maximize the pseudo-likelihood on the piecewise model.
Definition of Piecewise Pseudo likelihood
For a single instance ,
where
Therefore the optimization function is
where the second term is the standard gaussian prior to prevent over fitting.
Experimental results
- Sequences generated by a 2 order HMM.
- POS tagging on Penn Treebank set.