Entropy Gradient for Semi-Supervised Conditional Random Fields
This method is used by Mann and McCallum, 2007 for efficient computation of the entropy gradient used as a regularizer to train semi-supervised conditional random fields. The method is an improvement over the original proposed approach by Jiao et al., 2006 in terms of computing the gradient on unlabeled part of the training data.
Summary
Entropy regularization (ER) is a method applied to semi-supervised learning that augments a standard conditional likelihood objective function with an additional term that aims to minimize the predicted label entropy on unlabeled data. By insisting on peaked, confident predictions, ER guides the decision boundary away from dense regions of input space. Entropy regularization for semi-supervised learning was first proposed for classification tasks by Grandvalet and Bengio, 2004.
Motivation
Jiao et al. 2006 apply this method to linear chain CRFs and demonstrate encouraging accuracy improvements on a gene-name-tagging task. However, the method they presented for calculating the gradient of the entropy takes substantially greater time than the traditional supervised-only gradient. Whereas supervised training requires only classic forward/backward style algorithms, taking time (sequence length times the square of the number of labels), their training method takes — a factor of more.