Difference between revisions of "Jiao et al COLING 2006"
PastStudents (talk | contribs) (Created page with '== Citation == == Online Version == == Summary == == Related Papers ==') |
PastStudents (talk | contribs) |
||
(7 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
== Citation == | == Citation == | ||
+ | |||
+ | Jiao, F., Wang, S., Lee, C.H., Greiner, R., and Schuurmans, D. Semi-supervised conditional random fields for improved sequence segmentation and labeling. Proceedings of the 21st International Conference on Computational Linguistics. (2006) 209-216. | ||
== Online Version == | == Online Version == | ||
+ | |||
+ | http://acl.ldc.upenn.edu/P/P06/P06-1027.pdf | ||
== Summary == | == Summary == | ||
+ | |||
+ | This [[Category::paper]] presented a novel method to using a [[UsesMethod::Conditional Random fields|CRF]] in a [[UsesMethod::semi-supervised learning]] setting. HMMs and other generative models easily incorporate unlabeled data using EM, but have difficulty with non-independent features. Semi-supervised discriminative approaches were less well explored. By incorporating extra data, the new technique improves the accuracy over a baseline CRF trained just on labeled data. In tandem, the authors developed an efficient dynamic programming algorithm to calculate a covariance matrix of features, something necessary to calculate the gradient and perform iterative ascent. | ||
+ | |||
+ | The key idea is to minimize the conditional entropy of the unlabeled data, thereby maximizing the certainty of the labellings and reinforcing the supervised labels. Equivalently, this is like maximizing the KL divergence, making two distributions "farther" apart or decreasing their overlap. | ||
+ | |||
+ | The optimization criterion is to maximize the sum of the conditional likelihood of the labeled samples and the negative conditional entropy of the unlabeled examples, along with regularization. This extra entropy term leads to a non-concave optimization function. However, one can still attempt to improve on a fully supervised CRF by using its learned parameter values as the starting point of an L-BFGS algorithm. | ||
+ | |||
+ | An experiment on named entity recognition of gene names resulted in generally much improved recall and F-measures. | ||
+ | |||
== Related Papers == | == Related Papers == | ||
+ | |||
+ | This form of minimum entropy regularization was first explored by [[RelatedPaper::Grandvalet and Bengio, NIPS 2004]] for a single, unstructured, variable. | ||
+ | |||
+ | CRFs were first proposed by [[RelatedPaper::Lafferty et al, ICML 2001]]. | ||
+ | |||
+ | The dataset analyzed was from [[RelatedPaper::McDonald et al 2005]]. |
Latest revision as of 17:05, 9 October 2010
Citation
Jiao, F., Wang, S., Lee, C.H., Greiner, R., and Schuurmans, D. Semi-supervised conditional random fields for improved sequence segmentation and labeling. Proceedings of the 21st International Conference on Computational Linguistics. (2006) 209-216.
Online Version
http://acl.ldc.upenn.edu/P/P06/P06-1027.pdf
Summary
This paper presented a novel method to using a CRF in a semi-supervised learning setting. HMMs and other generative models easily incorporate unlabeled data using EM, but have difficulty with non-independent features. Semi-supervised discriminative approaches were less well explored. By incorporating extra data, the new technique improves the accuracy over a baseline CRF trained just on labeled data. In tandem, the authors developed an efficient dynamic programming algorithm to calculate a covariance matrix of features, something necessary to calculate the gradient and perform iterative ascent.
The key idea is to minimize the conditional entropy of the unlabeled data, thereby maximizing the certainty of the labellings and reinforcing the supervised labels. Equivalently, this is like maximizing the KL divergence, making two distributions "farther" apart or decreasing their overlap.
The optimization criterion is to maximize the sum of the conditional likelihood of the labeled samples and the negative conditional entropy of the unlabeled examples, along with regularization. This extra entropy term leads to a non-concave optimization function. However, one can still attempt to improve on a fully supervised CRF by using its learned parameter values as the starting point of an L-BFGS algorithm.
An experiment on named entity recognition of gene names resulted in generally much improved recall and F-measures.
Related Papers
This form of minimum entropy regularization was first explored by Grandvalet and Bengio, NIPS 2004 for a single, unstructured, variable.
CRFs were first proposed by Lafferty et al, ICML 2001.
The dataset analyzed was from McDonald et al 2005.