|
|
(31 intermediate revisions by the same user not shown) |
Line 1: |
Line 1: |
− | ==Citation==
| |
| | | |
− | Chun-Nam John Yu and Thorsten Joachims. Learning structural SVMs with latent variables. In
| |
− | Proceedings of the 26th International Conference on Machine Learning,Montréal, Québec, Canada,
| |
− | 2009.
| |
− |
| |
− | == Online version ==
| |
− | [http://www.cs.cornell.edu/~cnyu/papers/icml09_latentssvm.pdf]
| |
− |
| |
− | == Summary ==
| |
− |
| |
− | In this [[Category::paper]] author talks about the use of latent variable in the structural SVM.
| |
− | The paper also identifies the formulation for which their exists effecient algorithm to find the local optimum using convex-concave optimization techniques.
| |
− | The paper argues that this is the first time latent variable are being used in large margin classifiers.Experiments were then performed in various domains
| |
− | of computational Biology, IR and NLP to prove the generality of the proposed method.
| |
− |
| |
− | == Method Used ==
| |
− |
| |
− | This paper extends the formulation of Structured SVM given by Tsochantaridis to
| |
− | include a latent variable in it.
| |
− |
| |
− | Consider set of Structed input out put pairs S
| |
− |
| |
− | Let
| |
− |
| |
− | <math> S = {(x1,y1),.......(xn,yn)} \epsilon (X x Y)^n </math>.
| |
− |
| |
− | The prediction rule will be
| |
− |
| |
− | <math> f_w(x) = argmax_{y \epsilon Y} [w.G(x,y)] </math>
| |
− |
| |
− | where G is the joint feature vector that describes the relation between input and output.This paper introduces an extra latent variable h
| |
− | so now the prediction rule changes to
| |
− |
| |
− | <math> f_w(x) = argmax_{(y,h) \epsilon YxH} Y[w.G(h,x,y)] </math>
| |
− |
| |
− | Similary extending the loss function <math> \triangle </math> to include latent variable
| |
− |
| |
− | <math> \triangle((y_i,h_i^*(w)), (y_i^opt(w), hi^opt (w))) </math>
| |
− |
| |
− | where
| |
− | <math> h_i^*(w) = argmax_{h_ /epsilon H}w.G(x_i,y_i,h)</math>
| |
− |
| |
− | <math>(y_i^opt(w), hi^optopt(w)) = argmax_{(y,h) /epsilon YxH}w.G(x_i,y,h) </math>
| |
− |
| |
− | Loss function is the difference between the pair given by prediction rule and the latent variable <math> h_i^* </math> which explains the <math> (x_i, y_i) </math>
| |
− |
| |
− | Like in the case of structural svm we can derive the upper bound of this function to be
| |
− |
| |
− | ==
| |
− | Let <math> S = {(x_1,y_1),.......(x_n,y_n)}\epsilon(X x Y)^n </math>.
| |
− |
| |
− | The prediction rule will be
| |
− |
| |
− | <math> f_w(x) = argmax_{y \epsilon Y} [w.G(x,y)] </math>
| |
− |
| |
− | where G is the joint feature vector that describes the relation between input and output.This paper introduces an extra latent variable h
| |
− | so now the prediction rule changes to
| |
− |
| |
− | <math> f_w(x) = argmax_{(y,h) \epsilon YxH} Y[w.G(h,x,y)] </math>
| |
− | ==
| |