Difference between revisions of "Entropy Gradient for Semi-Supervised Conditional Random Fields"

Revision as of 00:07, 30 October 2011

This method is used by Mann and McCallum, 2007 for efficient computation of the entropy gradient used as a regularizer to train semi-supervised conditional random fields. The method is an improvement over the original proposed approach by Jiao et al., 2006 in terms of computing the gradient on unlabeled part of the training data.

Summary

Entropy regularization (ER) is a method applied to semi-supervised learning that augments a standard conditional likelihood objective function with an additional term that aims to minimize the predicted label entropy on unlabeled data. By insisting on peaked, confident predictions, ER guides the decision boundary away from dense regions of input space. Entropy regularization for semi-supervised learning was first proposed for classification tasks by Grandvalet and Bengio, 2004.

Motivation

Jiao et al. 2006 apply this method to linear chain CRFs and demonstrate encouraging accuracy improvements on a gene-name-tagging task. However, the method they presented for calculating the gradient of the entropy takes substantially greater time than the traditional supervised-only gradient. Whereas supervised training requires only classic forward/backward style algorithms, taking time $O(ns^{2})$ (sequence length times the square of the number of labels), their training method takes $O(n^{2}s^{3})$ — a factor of $O(ns)$ more.

@@ Line 4: / Line 4: @@
 Entropy regularization (ER) is a method applied to [[AddressesProblem::semi-supervised learning]] that augments a standard conditional likelihood objective function with an additional term that aims to minimize the predicted label entropy on unlabeled data. By insisting on peaked, confident predictions, ER guides the decision boundary away from dense regions of input space. Entropy regularization for semi-supervised learning was first proposed for classification tasks by [[RelatedPaper::Grandvalet and Bengio, 2004]].
-== General Definition ==
+== Motivation ==
+[[RelatedPaper::Jiao et al. 2006]] apply this method to linear chain CRFs and demonstrate encouraging accuracy improvements on a gene-name-tagging task. However, the method they presented for calculating the gradient of the entropy takes substantially greater time than the traditional supervised-only gradient. Whereas supervised training requires only classic forward/backward style algorithms, taking time <math> O(ns^{2}) </math> (sequence length times the square of the number of labels), their training method takes <math> O(n^{2}s^{3}) </math> — a factor of <math> O(ns) </math> more.
 == Related Papers ==

Difference between revisions of "Entropy Gradient for Semi-Supervised Conditional Random Fields"

Revision as of 00:07, 30 October 2011

Summary

Motivation

Related Papers

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools