Kernel Conditional Random Fields: Representation and Clique Selection, Lafferty, Zhu, and Liu, 2004

Citation

John Lafferty, Xiaojin Zhu, Yan Liu. Kernel conditional random fields: representation and clique selection in ICML '04 Proceedings of the twenty-first international conference on Machine learning

Online version

[1]

Summary

The Paper introduces Kernel conditional random fields for Annotation of data having multiple components. The paper is divided into two parts. The first describes how kernel conditional random fields arise from risk minimization procedures defined using Mercer kernels on labeled graphs. The second part describes a procedure for greedily selecting cliques in the dual representation. The framework and clique selection methods are demonstrated in synthetic data experiments, and are also applied to the problem of protein secondary structure prediction.

Representer theorem

The representer theorem proposes that if the loss function $\phi$ is the negative log loss function, $K$ is the Mercer kernel over a set of graphs $G$ with associated RKHS norm $||\cdot ||_{K}$ , and $\omega :R_{+}\rightarrow R_{+}$ is a strictly increasing function, the minimizer $f^{*}$ of

$R_{\phi }f=\sum _{i=1}^{n}\phi _{i}+\omega (||f||_{K})$

if it exists, has the form

$f^{*}(\cdot )=\sum _{i}\sum _{c\in Cliques(g_{i})}\sum _{y_{c}}\alpha _{c}^{(i)}(y_{c})K_{c}(x^{(i)},y_{c},\cdot )$

Note that the dual parameters $\alpha _{c}^{(i)}(y_{c})$ depend on all assignments of labels.

Clique Selection

Cliques are incrementally selected to reduce regularized risk. It maintains an active set $A$ of labeled cliques. Of all the candidate cliques, it selects greedily based on functional gradient descent.

Steps -

Set $f=0$ and iterate

For each candidate h, supported by a single labeled clique, calculate the functional derivative $dR_{\phi }(f,h)$ .
Select $h=argmax_{h}|dR_{\phi }(f,h)|$ . Set $f\rightarrow f+\alpha _{h}h$ .
Estimate $\alpha _{f}$ .

Experiments

Galaxy dataset, a variant of two spirals is constructed by a 2 state [HMM]. A semi-supervised graph is then constructed by unweighted 10 nearest neighbour approach. The kernel is $K=10(L+10^{-6}I)^{-1}$ . The standard kernel is the radial basis function (RBF) kernel with bandwidth $\sigma =0.35$ .

HMM with Gaussian mixtures: This is a generated dataset from a 3-state HMM with each state a mixutre of 2 Gaussians. In this dataset, RBF kernel is used with $\sigma =0.5$ .

RS126 is for Protein secondary structure prediction.

Kernel Conditional Random Fields: Representation and Clique Selection, Lafferty, Zhu, and Liu, 2004

Contents

Citation

Online version

Summary

Representer theorem

Clique Selection

Experiments

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools