Difference between revisions of "Structured SVMs"

Latest revision as of 17:38, 2 November 2011

The Method and When to Use it

Structured (or Structural) Support Vector Machines (SSVM), as the name states, is a machine learning model that generalizes the Support Vector Machine (SVM) classifier, allowing training a classifier for structured output.

In general, SSVMs perform supervised learning by approximating a mapping $f$

$f:X\rightarrow Y$

where $x=\{(x_{1},y_{1}),...,(x_{n},y_{n})\}$ is a set of labeled training examples and $Y$ is a complex structured object, like trees, sequences, or sets, instead of simple univariate predictions (as in the SVM case).

Thus, training a SSVM classifier consists of showing pairs of correct sample and output label pairs, that are used for training, allowing to predict for new sample instances the corresponding output label

In NLP one can fing a great variety of problems that rely on complex outputs, such as parsing and Markov Models for part-of-speech tagging.

Training

While on training, for a set of samples and labels $(x_{n},y_{n})\in X\times Y,n=1,...,l$ , the SSVM minimizes the risk function:

$\min _{w}||w||^{2}+C\sum _{n=1}^{l}\max _{y\in Y}(\Delta (y_{n},y)+w'\Psi (x_{n},y)-w'\Psi (x_{n},y_{n}))$

where $\Delta$ is an arbitrary function, which measures the distance between to labels and $\Psi$ is a function on samples and labels, which extracts feature vectors. These two function are to be defined according to the problem at hand

Since the equation above is non-differentiable, one can reformulate it introducing slack variables, $\xi _{n}$ , representing the value of the maximum. Using this approach the SSVM comes as:

$\min _{w,\xi }||w||^{2}+C\sum _{n=1}^{l}\xi _{n}$

       $s.t.w'\Psi (x_{n},y_{n})-w'\Psi (x_{n},y)+\xi _{n}\geq \Delta (y_{n},y),n=1,...,l\forall y\in Y$

       $\xi _{n}\geq 0,n=1,...,l$

Testing

Given a sample, $x\in X$ and a mapping $f:X\rightarrow Y$ one can obtain the correspondent lable. The mapping is defined as

$f(x)=\arg \max _{y\in Y}w'\Psi (x,y)$

where $w$ is the vector during the training phase.

The inference problem of solving for this maximizer over the label space is dependent on the structure of the function $\Psi$ that varies according to the problem.

@@ Line 1: / Line 1: @@
-Being edited by Rui Correia
+== The Method and When to Use it==
-== The Method and When to Use it ==
 Structured (or Structural) Support Vector Machines (SSVM), as the name states, is a machine learning model that generalizes the [[Support Vector Machines| Support Vector Machine (SVM)]] classifier, allowing training a classifier for structured output.
-In general, SSVMs perform supervised learning by approximating a mapping <math>h</math>
+In general, SSVMs perform supervised learning by approximating a mapping <math>f</math>
 <math>
-h: X \rightarrow Y
+f: X \rightarrow Y
 </math>
@@ Line 26: / Line 24: @@
 </math>
-where <math>\Delta</math> is an arbitrary function, which measures the distance between to labels and <math>\Psi</math> is a function on samples and labels, which extracts feature vectors.
+where <math>\Delta</math> is an arbitrary function, which measures the distance between to labels and <math>\Psi</math> is a function on samples and labels, which extracts feature vectors. These two function are to be defined according to the problem at hand
-Since the regularized risk function above is non-differentiable, it is often reformulated in terms of a quadratic program by introducing one slack variables <math>\xi_i</math> for each sample, each representing the value of the maximum. The standard structured SVM primal formulation is given as follows.
+Since the equation above is non-differentiable, one can reformulate it introducing slack variables, <math>\xi_n</math>, representing the value of the maximum. Using this approach the SSVM comes as:
-Slightly  diﬀerent  version  of  the  loss  function:
+<math>
+\min_{w,\xi} ||w||^2 + C \sum^l_{n=1} \xi_n
+</math>
-<math>
+       <math>
-\min_i \frac{C}{2} ||w||{^{2}}{_{2}} + \sum^N_{i=1} \xi_i
+s.t.   w'\Psi(x_n,y_n) - w'\Psi(x_n,y) + \xi_n \ge \Delta (y_n,y), n= 1,...,l \forall y \in Y
 </math>
         <math>
-s.t. \forall i, \forall y, w^T g (x_i,y) \ge +1 - \frac {\xi_i}{cost(y, y_i)}
+     \xi_n \ge 0, n= 1,...,l
+</math>
+== Testing ==
+Given a sample, <math>x \in X </math> and a mapping <math>f: X \rightarrow Y</math> one can obtain the correspondent lable. The mapping is defined as
+<math>
+f(x) = \arg\max_{y \in Y} w'\Psi(x,y)
 </math>
+where <math>w</math> is the vector during the training phase.
+The inference problem of solving for this maximizer over the label space is dependent on the structure of the function <math>\Psi</math> that varies according to the problem.
 == Related Papers ==
-*[http://svmlight.joachims.org/svm_struct.html| I. Tsochantaridis, T. Hofmann, T. Joachims, and Y. Altun. Support Vector Learning for Interdependent and Structured Output Spaces, ICML, 2004.]
+*[[Taskar, B. et al, NIPS 2003| Taskar  et  al.  (2003)]]
-* Optimization Algorithms
+*[http://svmlight.joachims.org/svm_struct.html| Tsochantaridis et al. (2004)]
-**Taskar  et  al. (2003):   SMO  based  on  factored  dual
+*[[Taskar et al. 2004. Max-margin Parsing| Taskar et al. (2004)]]
-**Bartlet  et  al.  (2004)  and  Collins  et  al.  (2008):   exponentiated  gradient
+*[[Tsochantaridis et al, JMLR 2005| Tsochantaridis  et  al.  (2005)]]
-**Tsochantaridis  et  al.  (2005):   cutting  planes  (based  on  dual)
-**Taskar  et  al.  (2005):   dual  extragradient
-**Ratliﬀ  et  al.  (2006):   (stochastic)  subgradient  descent
-**Crammer  et  al.  (2006):   online “passive‐aggressive”  algorithms

Difference between revisions of "Structured SVMs"

Latest revision as of 17:38, 2 November 2011

Contents

The Method and When to Use it

Training

Testing

Related Papers

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools