Difference between revisions of "Chun-Nam John Yu, Hofmann , Learning structural SVMs with latent variables 2009"

From Cohen Courses
Jump to navigationJump to search
Line 8: Line 8:
 
The paper argues that this is the first time latent variable are being used in large margin classifiers.Experiments were then performed in various domains  
 
The paper argues that this is the first time latent variable are being used in large margin classifiers.Experiments were then performed in various domains  
 
of computational Biology, IR and NLP to prove the generality of the proposed method.
 
of computational Biology, IR and NLP to prove the generality of the proposed method.
 
== Method Used ==
 
 
This paper extends the formulation of Structured SVM given by Tsochantaridis to
 
include a latent variable in it.
 
 
Consider set of Structed input out put pairs <math>S = {(x1,y1),.......(xn,yn)}  \epsilon  (X x Y)^n</math>
 
 
The prediction rule comes as
 
<math>f_w(x) = argmax_{y \epsilon Y} [w.G(x,y)]</math>
 
 
where G is the joint feature vector that describes the relation between input and output.This paper introduces an extra latent variable h
 
so now the prediction rule will be
 
 
<math> f_w(x) = argmax_{(y,h) \epsilon YxH} Y[w.G(h,x,y)] </math>
 
 
 
 
Similary extending the loss function <math> \triangle </math> to include latent variable
 
will be:
 
<math> \triangle((y_i,h_i^*(w)), (y_i^opt(w), hi^opt (w))) </math>
 
 
          where
 
            <math> h_i^*(w) = argmax_{h_ \epsilon H}w.G(x_i,y_i,h)</math>
 
             
 
            <math> (y_i^opt(w), hi^optopt(w)) = argmax_{(y,h) \epsilon YxH}w.G(x_i,y,h) </math>
 
 
Loss function is the difference between the pair given by prediction rule and the latent variable <math> h_i^* </math> which explains the <math> (X_i, Y_i) </math>
 
 
Like in the case of structural svm we can derive the upper bound of this function maximizing over y and h.It further assumes that loss function does not depend upon the latent variable for the tasks in considerations.
 
The final objective function comes as
 
 
<math> min_w[1/2||w^2|| + C\sum\limits_{s=1}^{n} max_{y^o,h^o \epsilon (YxH)} [w.G(x_i,y^o,h^o)+ \triangle(yi,y^o,h^o)]] - [C\sum\limits_{s=1}^{n} max_{h \epsilon H} w.G(x_i,y_i,h)] </math>
 
 
This objective function is the difference of two convex functions , so it can be soved using Concave Convex procedure given below
 
<ul>
 
#1.Set t = 0 and initialize <math> w_0 </math>
 
#2 repeat
 
## Find Hyperplane vt such that -g(w) <= -g(wt) +(w-wt).vt for all w
 
## Solve wt+1 = <math>w_{t+1} = argmin{w}f(w) + w.v_t</math>
 
## t=t+1
 
#until <math>[f(w_t) - g(w_t)] - [f(w_{t-1} - g(w_{t-1})] < \epsilon </math>
 
</ul>
 
 
The paper claims that above algorithm is guaranteed to converge.
 
  
 
== Experimentation ==
 
== Experimentation ==

Revision as of 02:37, 1 October 2011

Online version

[1]

Summary

In this paper author talks about the use of latent variable in the structural SVM. The paper also identifies the formulation for which their exists effecient algorithm to find the local optimum using convex-concave optimization techniques. The paper argues that this is the first time latent variable are being used in large margin classifiers.Experiments were then performed in various domains of computational Biology, IR and NLP to prove the generality of the proposed method.

Experimentation

The experimentations were performbed in many domains and results were as follows

Discriminative Motif finding

Error rate : Gibbs sampler 32.49% Latent Structural SVM 12%

Noun Phrase Coreference via Clustering

MITRE Loss: SVM cluster 41.3 Latent SVM 35.6

The Structural SVM with latent variable also perform well in the task of Document Retrieval and outperformed List NET and Ranking SVM.