Kernel Conditional Random Fields: Representation and Clique Selection, Lafferty, Zhu, and Liu, 2004
Contents
Citation
John Lafferty, Xiaojin Zhu, Yan Liu. Kernel conditional random fields: representation and clique selection in ICML '04 Proceedings of the twenty-first international conference on Machine learning
Online version
Summary
The Paper introduces Kernel conditional random fields for Annotation of data having multiple components. The paper is divided into two parts. The first describes how kernel conditional random fields arise from risk minimization procedures defined using Mercer kernels on labeled graphs. The second part describes a procedure for greedily selecting cliques in the dual representation. The framework and clique selection methods are demonstrated in synthetic data experiments, and are also applied to the problem of protein secondary structure prediction.
Representer theorem
The representer theorem proposes that if the loss function is the negative log loss function, is the Mercer kernel over a set of graphs with associated RKHS norm , and is a strictly increasing function, the minimizer of
if it exists, has the form
Note that the dual parameters depend on all assignments of labels.
Clique Selection
Cliques are incrementally selected to reduce regularized risk. It maintains an active set of labeled cliques. Of all the candidate cliques, it selects greedily based on functional gradient descent.
Steps -
Set and iterate
- For each candidate h, supported by a single labeled clique, calculate the functional derivative .
- Select . Set .
- Estimate .
Experiments
- Galaxy dataset, a variant of two spirals is constructed by a 2 state [HMM]. A semi-supervised graph is then constructed by unweighted 10 nearest neighbour approach. The kernel is . The standard kernel is the radial basis function (RBF) kernel with bandwidth .
- HMM with Gaussian mixtures: This is a generated dataset from a 3-state HMM with each state a mixutre of 2 Gaussians. In this dataset, RBF kernel is used with .