|
|
(7 intermediate revisions by the same user not shown) |
Line 1: |
Line 1: |
− | ==Citation==
| |
| | | |
− | Chun-Nam John Yu and Thorsten Joachims. Learning structural SVMs with latent variables. In
| |
− | Proceedings of the 26th International Conference on Machine Learning,Montréal, Québec, Canada,
| |
− | 2009.
| |
− |
| |
− | == Online version ==
| |
− | [http://www.cs.cornell.edu/~cnyu/papers/icml09_latentssvm.pdf]
| |
− |
| |
− | == Summary ==
| |
− |
| |
− | In this [[Category::paper]] author talks about the use of latent variable in the structural SVM.
| |
− | The paper also identifies the formulation for which their exists effecient algorithm to find the local optimum using convex-concave optimization techniques.
| |
− | The paper argues that this is the first time latent variable are being used in large margin classifiers.Experiments were then performed in various domains
| |
− | of computational Biology, IR and NLP to prove the generality of the proposed method.
| |
− |
| |
− | == Method Used ==
| |
− |
| |
− | This paper extends the formulation of Structured SVM given by Tsochantaridis to
| |
− | include a latent variable in it.
| |
− |
| |
− | Consider set of Structed input out put pairs <math>S = {(x1,y1),.......(xn,yn)} \epsilon (X x Y)^n</math>
| |
− |
| |
− | The prediction rule comes as
| |
− | <math>f_w(x) = argmax_{y \epsilon Y} [w.G(x,y)]</math>
| |
− |
| |
− | where G is the joint feature vector that describes the relation between input and output.This paper introduces an extra latent variable h
| |
− | so now the prediction rule will be
| |
− |
| |
− | <math> f_w(x) = argmax_{(y,h) \epsilon YxH} Y[w.G(h,x,y)] </math>
| |
− |
| |
− |
| |
− |
| |
− | Similary extending the loss function <math> \triangle </math> to include latent variable
| |
− | will be:
| |
− | <math> \triangle((y_i,h_i^*(w)), (y_i^opt(w), hi^opt (w))) </math>
| |
− |
| |
− | where
| |
− | <math> h_i^*(w) = argmax_{h_ \epsilon H}w.G(x_i,y_i,h)</math>
| |
− |
| |
− | <math> (y_i^opt(w), hi^optopt(w)) = argmax_{(y,h) \epsilon YxH}w.G(x_i,y,h) </math>
| |
− |
| |
− | Loss function is the difference between the pair given by prediction rule and the latent variable <math> h_i^* </math> which explains the <math> (X_i, Y_i) </math>
| |
− |
| |
− | Like in the case of structural svm we can derive the upper bound of this function maximizing over y and h.It further assumes that loss function does not depend upon the latent variable for the tasks in considerations.
| |
− | The final objective function comes as
| |
− |
| |
− | <math> min_w[1/2||w^2|| + C\sum\limits_{s=1}^{n} max_{y^o,h^o \epsilon (YxH)} [w.G(x_i,y^o,h^o)+ \triangle(yi,y^o,h^o)]] - [C\sum\limits_{s=1}^{n} max_{h \epsilon H} w.G(x_i,y_i,h)] </math>
| |
− |
| |
− | This objective function is the difference of two convex functions , so it can be soved using Concave Convex procedure given below
| |
− | <ul>
| |
− | #1.Set t = 0 and initialize <math> w_0 </math>
| |
− | #2 repeat
| |
− | ## Find Hyperplane vt such that -g(w) <= -g(wt) +(w-wt).vt for all w
| |
− | ## Solve wt+1 = <math>w_{t+1} = argmin{w}f(w) + w.v_t</math>
| |
− | ## t=t+1
| |
− | #until <math>[f(w_t) - g(w_t)] - [f(w_{t-1} - g(w_{t-1})] < \epsilon </math>
| |
− | </ul>
| |