Difference between revisions of "Structured SVMs"

From Cohen Courses
Jump to navigationJump to search
 
(11 intermediate revisions by the same user not shown)
Line 1: Line 1:
Being edited by Rui Correia
+
== The Method and When to Use it==
  
== The Method and When to Use it ==
+
Structured (or Structural) Support Vector Machines (SSVM), as the name states, is a machine learning model that generalizes the [[Support Vector Machines| Support Vector Machine (SVM)]] classifier, allowing training a classifier for structured output.
Tsochantaridis
 et 
al. 
(2005) 
 –
extends 
their 
2004
 paper

 
  
SVMstruct is a Support Vector Machine (SVM) algorithm for predicting multivariate or structured outputs. It performs supervised learning by approximating a mapping
+
In general, SSVMs perform supervised learning by approximating a mapping <math>f</math>
  
h: X --> Y
+
<math>
using labeled training examples (x1,y1), ..., (xn,yn). Unlike regular SVMs, however, which consider only univariate predictions like in classification and regression, SVMstruct can predict complex objects y like trees, sequences, or sets. Examples of problems with complex outputs are natural language parsing, sequence alignment in protein homology detection, and markov models for part-of-speech tagging. The SVMstruct algorithm can also be used for linear-time training of binary and multi-class SVMs under the linear kernel [4].
+
f: X \rightarrow Y
 +
</math>
 +
 
 +
where <math>x = \{(x_1,y_1), ..., (x_n,y_n)\}</math> is a set of labeled training examples and <math>Y</math> is a complex structured object, like trees, sequences, or sets, instead of simple univariate predictions (as in the SVM case).
 +
 
 +
Thus, training a SSVM classifier consists of showing pairs of correct sample and output label pairs, that are used for training, allowing to predict for new sample instances the corresponding output label
 +
 
 +
In NLP one can fing a great variety of problems that rely on complex outputs, such as parsing and Markov Models for part-of-speech tagging.
  
The structured support vector machine is a machine learning algorithm that generalizes the Support Vector Machine (SVM) classifier. Whereas the SVM classifier supports binary classification, multiclass classification and regression, the structured SVM allows training of a classifier for general structured output labels.
+
== Training ==
As an example, a sample instance might be a natural language sentence, and the output label is an annotated parse tree. Training a classifier consists of showing pairs of correct sample and output label pairs. After training, the structured SVM model allows one to predict for new sample instances the corresponding output label; that is, given a natural language sentence, the classifier can produce the most likely parse tree.
 
  
 +
While on training, for a set of samples and labels <math>(x_n, y_n) \in X \times Y, n=1,...,l</math>, the SSVM minimizes the risk function:
  
  
== The Algorithm ==
+
<math>
 +
\min_{w} ||w||^2 + C \sum^l_{n=1} \max_{y \in Y} (\Delta(y_n,y) + w'\Psi(x_n, y) - w' \Psi(x_n,y_n))
 +
</math>
  
Slightly
 different 
version 
of 
the
 loss 
function:
+
where <math>\Delta</math> is an arbitrary function, which measures the distance between to labels and <math>\Psi</math> is a function on samples and labels, which extracts feature vectors. These two function are to be defined according to the problem at hand
 +
 
 +
 
 +
Since the equation above is non-differentiable, one can reformulate it introducing slack variables, <math>\xi_n</math>, representing the value of the maximum. Using this approach the SSVM comes as:
  
 
<math>
 
<math>
\min_i \frac{C}{2} ||w||{^{2}}{_{2}} + \sum^N_{i=1} \xi_i
+
\min_{w,\xi} ||w||^2 + C \sum^l_{n=1} \xi_n
 +
</math>
 +
 
 +
      <math>
 +
s.t.  w'\Psi(x_n,y_n) - w'\Psi(x_n,y) + \xi_n \ge \Delta (y_n,y), n= 1,...,l \forall y \in Y
 
</math>
 
</math>
  
 
       <math>
 
       <math>
s.t. \forall i, \forall y, w^T g (x_i,y) \ge +1 - \frac {\xi_i}{cost(y, y_i)}
+
    \xi_n \ge 0, n= 1,...,l
 +
</math>
 +
 
 +
== Testing ==
 +
Given a sample, <math>x \in X </math> and a mapping <math>f: X \rightarrow Y</math> one can obtain the correspondent lable. The mapping is defined as
 +
 
 +
 
 +
<math>
 +
f(x) = \arg\max_{y \in Y} w'\Psi(x,y)
 
</math>
 
</math>
 +
 +
where <math>w</math> is the vector during the training phase.
 +
 +
The inference problem of solving for this maximizer over the label space is dependent on the structure of the function <math>\Psi</math> that varies according to the problem.
 +
 
== Related Papers ==
 
== Related Papers ==
*[http://svmlight.joachims.org/svm_struct.html| I. Tsochantaridis, T. Hofmann, T. Joachims, and Y. Altun. Support Vector Learning for Interdependent and Structured Output Spaces, ICML, 2004.]
+
*[[Taskar, B. et al, NIPS 2003| Taskar 
et 
al.
 (2003)]]
* Optimization Algorithms
+
*[http://svmlight.joachims.org/svm_struct.html| Tsochantaridis et al. (2004)]
**Taskar 
et 
al.
(2003):

 SMO
 based 
on 
factored 
dual

+
*[[Taskar et al. 2004. Max-margin Parsing| Taskar et al. (2004)]]
**Bartlet
 et 
al. 
(2004) 
and 
Collins 
et 
al. 
(2008):

 exponentiated 
gradient

+
*[[Tsochantaridis et al, JMLR 2005| Tsochantaridis 
et 
al. 
(2005)]]
**Tsochantaridis 
et 
al. 
(2005): 

cutting 
planes 
(based 
on
 dual)

 
**Taskar
 et 
al. 
(2005):

 dual 
extragradient

 
**Ratliff
 et
 al.(2006):

 (stochastic) 
subgradient
 descent

 
**Crammer
 et 
al. 
(2006):

 online
“passive‐aggressive”
 algorithms
 

Latest revision as of 16:38, 2 November 2011

The Method and When to Use it

Structured (or Structural) Support Vector Machines (SSVM), as the name states, is a machine learning model that generalizes the Support Vector Machine (SVM) classifier, allowing training a classifier for structured output.

In general, SSVMs perform supervised learning by approximating a mapping

where is a set of labeled training examples and is a complex structured object, like trees, sequences, or sets, instead of simple univariate predictions (as in the SVM case).

Thus, training a SSVM classifier consists of showing pairs of correct sample and output label pairs, that are used for training, allowing to predict for new sample instances the corresponding output label

In NLP one can fing a great variety of problems that rely on complex outputs, such as parsing and Markov Models for part-of-speech tagging.

Training

While on training, for a set of samples and labels , the SSVM minimizes the risk function:


where is an arbitrary function, which measures the distance between to labels and is a function on samples and labels, which extracts feature vectors. These two function are to be defined according to the problem at hand


Since the equation above is non-differentiable, one can reformulate it introducing slack variables, , representing the value of the maximum. Using this approach the SSVM comes as:

      
      

Testing

Given a sample, and a mapping one can obtain the correspondent lable. The mapping is defined as


where is the vector during the training phase.

The inference problem of solving for this maximizer over the label space is dependent on the structure of the function that varies according to the problem.

Related Papers