Difference between revisions of "Chun-Nam John Yu, Hofmann , Learning structural SVMs with latent variables 2009"

@@ Line 1: / Line 1: @@
-==Citation==
-Chun-Nam John Yu and Thorsten Joachims. Learning structural SVMs with latent variables. In
-Proceedings of the 26th International Conference on Machine Learning,Montréal, Québec, Canada,
-.
-== Online version ==
-[http://www.cs.cornell.edu/~cnyu/papers/icml09_latentssvm.pdf]
-== Summary ==
-In this [[Category::paper]] author talks about the use of latent variable in the structural SVM.
-The paper also identifies the formulation for which their exists effecient algorithm to find the local optimum using convex-concave optimization techniques.
-The paper argues that this is the first time latent variable are being used in large margin classifiers.Experiments were then performed in various domains
-of computational Biology, IR and NLP to prove the generality of the proposed method.
-== Method Used ==
-This paper extends the formulation of Structured SVM given by Tsochantaridis to
-include a latent variable in it.
-Consider set of Structed input out put pairs <math>S = {(x1,y1),.......(xn,yn)}  \epsilon  (X x Y)^n</math>
-The prediction rule comes as
-<math>f_w(x) = argmax_{y \epsilon Y} [w.G(x,y)]</math>
-where G is the joint feature vector that describes the relation between input and output.This paper introduces an extra latent variable h
-so now the prediction rule will be
-<math> f_w(x) = argmax_{(y,h) \epsilon YxH} Y[w.G(h,x,y)] </math>
-Similary extending the loss function <math> \triangle </math> to include latent variable
-will be:
-	<math> \triangle((y_i,h_i^*(w)), (y_i^opt(w), hi^opt (w))) </math>
-          where
-             <math> h_i^*(w) = argmax_{h_ \epsilon H}w.G(x_i,y_i,h)</math>
-            <math> (y_i^opt(w), hi^optopt(w)) = argmax_{(y,h) \epsilon YxH}w.G(x_i,y,h) </math>
-	Loss function is the difference between the pair given by prediction rule and the latent variable <math> h_i^* </math> which explains the <math> (X_i, Y_i) </math>
-	Like in the case of structural svm we can derive the upper bound of this function maximizing over y and h.It further assumes that loss function does not depend upon the latent variable for the tasks in considerations.
-The final objective function comes as
- <math> min_w[1/2||w^2|| + C\sum\limits_{s=1}^{n} max_{y^o,h^o \epsilon (YxH)} [w.G(x_i,y^o,h^o)+ \triangle(yi,y^o,h^o)]] - [C\sum\limits_{s=1}^{n} max_{h \epsilon H} w.G(x_i,y_i,h)] </math>
-This objective function is the difference of two convex functions , so it can be soved using Concave Convex procedure given below
-<ul>
-#1.Set t = 0 and initialize <math> w_0 </math>
-#2 repeat
-## Find Hyperplane vt such that -g(w) <= -g(wt) +(w-wt).vt for all w
-## Solve wt+1 = <math>w_{t+1} = argmin{w}f(w) + w.v_t</math>
-## t=t+1
-#until <math>[f(w_t) - g(w_t)] - [f(w_{t-1} - g(w_{t-1})] < \epsilon </math>
-</ul>

Difference between revisions of "Chun-Nam John Yu, Hofmann , Learning structural SVMs with latent variables 2009"

Latest revision as of 02:37, 1 October 2011

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools