Kernel Conditional Random Fields: Representation and Clique Selection, Lafferty, Zhu, and Liu, 2004

From Cohen Courses
Jump to navigationJump to search

Citation

John Lafferty, Xiaojin Zhu, Yan Liu. Kernel conditional random fields: representation and clique selection in ICML '04 Proceedings of the twenty-first international conference on Machine learning

Online version

[1]

Summary

The Paper introduces Kernel conditional random fields for Annotation of data having multiple components. The paper is divided into two parts. The first describes how kernel conditional random fields arise from risk minimization procedures defined using Mercer kernels on labeled graphs. The second part describes a procedure for greedily selecting cliques in the dual representation. The framework and clique selection methods are demonstrated in synthetic data experiments, and are also applied to the problem of protein secondary structure prediction.

Representer theorem

The representer theorem proposes that if the loss function is the negative log loss function, is the Mercer kernel over a set of graphs with associated RKHS norm , and is a strictly increasing function, the minimizer of

if it exists, has the form

Note that the dual parameters depend on all assignments of labels.

Clique Selection

Cliques are incrementally selected to reduce regularized risk. It maintains an active set of labeled cliques. Of all the candidate cliques, it selects greedily based on functional gradient descent.

Steps -

Set and iterate

  1. For each candidate h, supported by a single labeled clique, calculate the functional derivative .
  2. Select . Set .
  3. Estimate .

Experiments

  1. Galaxy dataset, a variant of two spirals is constructed by a 2 state [HMM]. A semi-supervised graph is then constructed by unweighted 10 nearest neighbour approach. The kernel is . The standard kernel is the radial basis function (RBF) kernel with bandwidth .
  1. HMM with Gaussian mixtures: This is a generated dataset from a 3-state HMM with each state a mixutre of 2 Gaussians. In this dataset, RBF kernel is used with .
  1. RS126 is for Protein secondary structure prediction.