Kernel Conditional Random Fields: Representation and Clique Selection, Lafferty, Zhu, and Liu, 2004

From Cohen Courses
Jump to navigationJump to search

Citation

John Lafferty, Xiaojin Zhu, Yan Liu. Kernel conditional random fields: representation and clique selection in ICML '04 Proceedings of the twenty-first international conference on Machine learning

Online version

[1]

Summary

The Paper introduces Kernel conditional random fields for Annotation of data having multiple components. The paper is divided into two parts. The first describes how kernel conditional random fields arise from risk minimization procedures defined using Mercer kernels on labeled graphs. The second part describes a procedure for greedily selecting cliques in the dual representation. The framework and clique selection methods are demonstrated in synthetic data experiments, and are also applied to the problem of protein secondary structure prediction.

Representer theorem

The representer theorem proposes that if the loss function Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \phi} is the negative log loss function, is the Mercer kernel over a set of graphs Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle G} with associated RKHS norm Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle || \cdot ||_K} , and Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \omega:R_+ \rightarrow R_+} is a strictly increasing function, the minimizer Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle f^*} of

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle R_{\phi}f = \sum_{i=1}^n \phi_i + \omega(||f||_K)}

if it exists, has the form

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle f^*(\cdot) = \sum_i \sum_{c \in Cliques(g_i)} \sum_{y_c} \alpha_c^{(i)}(y_c)K_c(x^{(i)}, y_c, \cdot)}

Note that the dual parameters Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \alpha_c^{(i)}(y_c)} depend on all assignments of labels.

Clique Selection

Cliques are incrementally selected to reduce regularized risk. It maintains an active set Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle A} of labeled cliques. Of all the candidate cliques, it selects greedily based on functional gradient descent.

Steps -

Set Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle f=0} and iterate

  1. For each candidate h, supported by a single labeled clique, calculate the functional derivative Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle dR_{\phi}(f, h)} .
  2. Select Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle h = arg max_h|dR_{\phi}(f, h)|} . Set Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle f \rightarrow f + \alpha_h h} .
  3. Estimate Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \alpha_f} .

Experiments

  1. Galaxy dataset, a variant of two spirals is constructed by a 2 state [HMM]. A semi-supervised graph is then constructed by unweighted 10 nearest neighbour approach. The kernel is Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle K = 10(L+10^{-6}I)^{-1}} . The standard kernel is the radial basis function (RBF) kernel with bandwidth Failed to parse (Conversion error. Server ("https://wikimedia.org/api/rest_") reported: "Cannot get mml. Server problem."): {\displaystyle \sigma =0.35} .
  1. HMM with Gaussian mixtures: This is a generated dataset from a 3-state HMM with each state a mixutre of 2 Gaussians. In this dataset, RBF kernel is used with Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \sigma=0.5} .
  1. RS126 is for Protein secondary structure prediction.