Difference between revisions of "Sutton McCullum ICML 2007: Piecewise pseudolikelihood for efficient CRF training"
(Created page with '== Citation == {{MyCitejournal | coauthors = Andrew McCallum | date = 2007| first = Charles| journal = ICML| last = Sutton | title = Piecewise Pseudolikelihood for Efficient Tr…') |
|||
Line 27: | Line 27: | ||
where the second term is the standard gaussian prior to prevent over fitting. | where the second term is the standard gaussian prior to prevent over fitting. | ||
− | == | + | == Experimental results == |
* Sequences generated by a 2 order HMM. | * Sequences generated by a 2 order HMM. | ||
* POS tagging on Penn Treebank set. | * POS tagging on Penn Treebank set. | ||
+ | |||
+ | == Related Papers == | ||
+ | |||
+ | Pseudolikelihood was proposed by Besag (1975) and has been applied in NLP by Toutanova et al. (2003) and others. |
Revision as of 01:40, 1 October 2011
Contents
Citation
Piecewise Pseudolikelihood for Efficient Training of Conditional Random Fields. By Charles Sutton, Andrew McCallum. In ICML, vol. {{{volume}}} ({{{issue}}}), 2007.
Online version
http://www.machinelearning.org/proceedings/icml2007/papers/549.pdf
Summary
Discriminative training of graphical models is expensive if the cardinality of the variables is large. Generally pseudo-likelihood reduces the cost of inference, but compromises on accuracy. Piecewise training although is accurate, is expensive in a similar way. The authors try to maximize the pseudo-likelihood on the piecewise model. If represent the maximum number of assignments to a single variable and represents the size of the largest
Definition of Piecewise Pseudo likelihood
For a single instance
where
Therefore the optimization function is
where the second term is the standard gaussian prior to prevent over fitting.
Experimental results
- Sequences generated by a 2 order HMM.
- POS tagging on Penn Treebank set.
Related Papers
Pseudolikelihood was proposed by Besag (1975) and has been applied in NLP by Toutanova et al. (2003) and others.