Entropy Gradient for Semi-Supervised Conditional Random Fields

This method is used by Mann and McCallum, 2007 for efficient computation of the entropy gradient used as a regularizer to train semi-supervised conditional random fields. The method is an improvement over the original proposed approach by Jiao et al., 2006 in terms of computing the gradient on unlabeled part of the training data.

Summary

Entropy regularization (ER) is a method applied to semi-supervised learning that augments a standard conditional likelihood objective function with an additional term that aims to minimize the predicted label entropy on unlabeled data. By insisting on peaked, confident predictions, ER guides the decision boundary away from dense regions of input space. Entropy regularization for semi-supervised learning was first proposed for classification tasks by Grandvalet and Bengio, 2004.

Motivation

Jiao et al. 2006 apply this method to linear chain CRFs and demonstrate encouraging accuracy improvements on a gene-name-tagging task. However, the method they presented for calculating the gradient of the entropy takes substantially greater time than the traditional supervised-only gradient. Whereas supervised training requires only classic forward/backward style algorithms, taking time $O(ns^{2})$ (sequence length times the square of the number of labels), their training method takes $O(n^{2}s^{3})$ — a factor of $O(ns)$ more.

This method proposed in Mann and McCallum, 2007 introduces a more efficient way to derive entropy gradient based on dynamic programming that has the same asymptotic time complexity as that of a supervised CRF training process, $O(ns2)$ . This calculation introduces the concept of subsequence constrained entropy — the entropy of a CRF for an observed data sequence when part of the label sequence is fixed. This method is especially useful for training CRFs on larger unannotated data sets.

Semi-Supervised CRF Training

A standard linear chain CRF is trained by maximizing the log-likelihood $L(\theta ;D)$ on a labeled dataset $D$ . Gradient methods like L-BFGS are commonly used to optimize the following objective function:

For semi-supervised training by entropy regularization, the objective function is augmented by adding the negative entropy of the unannotated data $U=\langle u_{1}..u_{n}\rangle$ as shown below. A Gaussian prior is also added to the function.

Entropy Gradient Computation

Jiao et al., 2006 perform the computation of entropy gradient in the following manner:

Entropy Gradient for Semi-Supervised Conditional Random Fields

Contents

Summary

Motivation

Semi-Supervised CRF Training

Entropy Gradient Computation

Related Papers

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools