Difference between revisions of "Chun-Nam John Yu, Hofmann , Learning structural SVMs with latent variables 2009"

Revision as of 00:58, 1 October 2011

Citation

Chun-Nam John Yu and Thorsten Joachims. Learning structural SVMs with latent variables. In Proceedings of the 26th International Conference on Machine Learning,Montréal, Québec, Canada, 2009.

Online version

[1]

Summary

In this paper author talks about the use of latent variable in the structural SVM. The paper also identifies the formulation for which their exists effecient algorithm to find the local optimum using convex-concave optimization techniques. The paper argues that this is the first time latent variable are being used in large margin classifiers.Experiments were then performed in various domains of computational Biology, IR and NLP to prove the generality of the proposed method.

Method Used

This paper extends the formulation of Structured SVM given by Tsochantaridis to include a latent variable in it.

Consider set of Structed input out put pairs $S={(x1,y1),.......(xn,yn)}\epsilon (XxY)^{n}$

The prediction rule will be: $f_{w}(x)=argmax_{y\epsilon Y}[w.G(x,y)]$

where G is the joint feature vector that describes the relation between input and output.This paper introduces an extra latent variable h so now the prediction rule changes to

$f_{w}(x)=argmax_{(y,h)\epsilon YxH}Y[w.G(h,x,y)]$

Similary extending the loss function $\triangle$ to include latent variable will be: $\triangle ((y_{i},h_{i}^{*}(w)),(y_{i}^{o}pt(w),hi^{o}pt(w)))$

         where
             $h_{i}^{*}(w)=argmax_{h_{\epsilon }H}w.G(x_{i},y_{i},h)$ 
             
            $(y_{i}^{o}pt(w),hi^{o}ptopt(w))=argmax_{(y,h)\epsilon YxH}w.G(x_{i},y,h)$

Loss function is the difference between the pair given by prediction rule and the latent variable $h_{i}^{*}$ which explains the $(X_{i},Y_{i})$

Like in the case of structural svm we can derive the upper bound of this function maximizing over y and h.It further assumes that loss function does not depend upon the latent variable for the tasks in considerations. The final objective function comes as

Failed to parse (unknown function "\math"): {\displaystyle min_w[1/2||w^2|| + C\sum\limits_{s=1}^{n} max_{y^o,h^o \epsilon Yx H} [w.G(x_i,y^o,h^o)+ \triangle(yi,y^o,h^o)]] - [C\sum\limits_{s=1}^{n} max_{h \epsilon H} w.G(x_i,y_i,h)] <\math> == Let <math> S = {(x_1,y_1),.......(x_n,y_n)}\epsilon(X x Y)^n } .

The prediction rule will be

 $f_{w}(x)=argmax_{y\epsilon Y}[w.G(x,y)]$ 

where G is the joint feature vector that describes the relation between input and output.This paper introduces an extra latent variable h
so now the prediction rule changes to 

 $f_{w}(x)=argmax_{(y,h)\epsilon YxH}Y[w.G(h,x,y)]$ 
==

@@ Line 43: / Line 43: @@
 	Loss function is the difference between the pair given by prediction rule and the latent variable <math> h_i^* </math> which explains the <math> (X_i, Y_i) </math>
-	Like in the case of structural svm we can derive the upper bound of this function to be
+	Like in the case of structural svm we can derive the upper bound of this function maximizing over y and h.It further assumes that loss function does not depend upon the latent variable for the tasks in considerations.
+The final objective function comes as
+<math> min_w[1/2||w^2|| + C\sum\limits_{s=1}^{n} max_{y^o,h^o \epsilon Yx H} [w.G(x_i,y^o,h^o)+ \triangle(yi,y^o,h^o)]] - [C\sum\limits_{s=1}^{n} max_{h \epsilon H} w.G(x_i,y_i,h)] <\math>
 ==

Difference between revisions of "Chun-Nam John Yu, Hofmann , Learning structural SVMs with latent variables 2009"

Revision as of 00:58, 1 October 2011

Contents

Citation

Online version

Summary

Method Used

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools