Difference between revisions of "Kernel Conditional Random Fields: Representation and Clique Selection, Lafferty, Zhu, and Liu, 2004"

From Cohen Courses
Jump to navigationJump to search
 
Line 1: Line 1:
Under construction by [[User:Dkulkarn]].
+
== Citation ==
  
[[Category::Paper]]
+
John Lafferty, Xiaojin Zhu, Yan Liu. Kernel conditional random fields: representation and clique selection in ICML '04 Proceedings of the twenty-first international conference on Machine learning
 +
 
 +
== Online version ==
 +
 
 +
[http://dl.acm.org/citation.cfm?id=1015337]
 +
 
 +
== Summary ==
 +
 
 +
The [[Category::Paper]] introduces Kernel conditional random fields for [[AddressesProblem::Annotation of data]] having multiple components. The paper is divided into two parts. The first describes how kernel conditional random fields arise from risk minimization procedures defined using Mercer kernels on labeled graphs. The second part describes a procedure for greedily selecting cliques in the dual representation. The framework and clique selection methods are demonstrated in synthetic data experiments, and are also applied to the problem of protein secondary structure prediction.
 +
 
 +
== Representer theorem ==
 +
 
 +
The representer theorem proposes that if the loss function <math>\phi</math> is the negative log loss function, <math>K</math> is the Mercer kernel over a set of graphs <math>G</math> with associated RKHS norm <math>|| \cdot ||_K</math>, and <math>\omega:R_+ \rightarrow R_+</math> is a strictly increasing function, the minimizer <math>f^*</math> of
 +
 
 +
<math>R_{\phi}f = \sum_{i=1}^n \phi_i + \omega(||f||_K)</math>
 +
 
 +
if it exists, has the form
 +
 
 +
<math>f^*(\cdot) = \sum_i \sum_{c \in Cliques(g_i)} \sum_{y_c} \alpha_c^{(i)}(y_c)K_c(x^{(i)}, y_c, \cdot)</math>
 +
 
 +
Note that the dual parameters <math>\alpha_c^{(i)}(y_c)</math> depend on all assignments of labels.
 +
 
 +
== Clique Selection ==
 +
 
 +
Cliques are incrementally selected to reduce regularized risk. It maintains an active set <math>A</math> of labeled cliques. Of all the candidate cliques, it selects greedily based on functional gradient descent.
 +
 
 +
Steps -
 +
 
 +
Set <math>f=0</math> and iterate
 +
 
 +
# For each candidate h, supported by a single labeled clique, calculate the functional derivative <math>dR_{\phi}(f, h)</math>.
 +
# Select <math>h = arg max_h|dR_{\phi}(f, h)|</math>. Set <math>f \rightarrow f + \alpha_h h</math>.
 +
# Estimate <math>\alpha_f</math>.
 +
 
 +
== Experiments ==
 +
 
 +
#[[UsesDataset::Galaxy]] dataset, a variant of two spirals is constructed by a 2 state [HMM]. A semi-supervised graph is then constructed by unweighted 10 nearest neighbour approach.  The kernel is <math>K = 10(L+10^{-6}I)^{-1}</math>. The standard kernel is the radial basis function (RBF) kernel with bandwidth <math>\sigma = 0.35</math>.
 +
 
 +
#HMM with Gaussian mixtures: This is a generated dataset from a 3-state HMM with each state a mixutre of 2 Gaussians. In this dataset, RBF kernel is used with <math>\sigma=0.5</math>.
 +
 
 +
#[[UsesDataset::RS126]] is for [[AddressesProblem::Protein secondary structure prediction]].

Latest revision as of 02:05, 1 December 2011

Citation

John Lafferty, Xiaojin Zhu, Yan Liu. Kernel conditional random fields: representation and clique selection in ICML '04 Proceedings of the twenty-first international conference on Machine learning

Online version

[1]

Summary

The Paper introduces Kernel conditional random fields for Annotation of data having multiple components. The paper is divided into two parts. The first describes how kernel conditional random fields arise from risk minimization procedures defined using Mercer kernels on labeled graphs. The second part describes a procedure for greedily selecting cliques in the dual representation. The framework and clique selection methods are demonstrated in synthetic data experiments, and are also applied to the problem of protein secondary structure prediction.

Representer theorem

The representer theorem proposes that if the loss function is the negative log loss function, is the Mercer kernel over a set of graphs with associated RKHS norm , and is a strictly increasing function, the minimizer of

if it exists, has the form

Note that the dual parameters depend on all assignments of labels.

Clique Selection

Cliques are incrementally selected to reduce regularized risk. It maintains an active set of labeled cliques. Of all the candidate cliques, it selects greedily based on functional gradient descent.

Steps -

Set and iterate

  1. For each candidate h, supported by a single labeled clique, calculate the functional derivative .
  2. Select . Set .
  3. Estimate .

Experiments

  1. Galaxy dataset, a variant of two spirals is constructed by a 2 state [HMM]. A semi-supervised graph is then constructed by unweighted 10 nearest neighbour approach. The kernel is . The standard kernel is the radial basis function (RBF) kernel with bandwidth .
  1. HMM with Gaussian mixtures: This is a generated dataset from a 3-state HMM with each state a mixutre of 2 Gaussians. In this dataset, RBF kernel is used with .
  1. RS126 is for Protein secondary structure prediction.