Difference between revisions of "Structured SVMs"
(11 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
− | + | == The Method and When to Use it== | |
− | + | Structured (or Structural) Support Vector Machines (SSVM), as the name states, is a machine learning model that generalizes the [[Support Vector Machines| Support Vector Machine (SVM)]] classifier, allowing training a classifier for structured output. | |
− | |||
− | + | In general, SSVMs perform supervised learning by approximating a mapping <math>f</math> | |
− | + | <math> | |
− | + | f: X \rightarrow Y | |
+ | </math> | ||
+ | |||
+ | where <math>x = \{(x_1,y_1), ..., (x_n,y_n)\}</math> is a set of labeled training examples and <math>Y</math> is a complex structured object, like trees, sequences, or sets, instead of simple univariate predictions (as in the SVM case). | ||
+ | |||
+ | Thus, training a SSVM classifier consists of showing pairs of correct sample and output label pairs, that are used for training, allowing to predict for new sample instances the corresponding output label | ||
+ | |||
+ | In NLP one can fing a great variety of problems that rely on complex outputs, such as parsing and Markov Models for part-of-speech tagging. | ||
− | + | == Training == | |
− | |||
+ | While on training, for a set of samples and labels <math>(x_n, y_n) \in X \times Y, n=1,...,l</math>, the SSVM minimizes the risk function: | ||
− | = | + | <math> |
+ | \min_{w} ||w||^2 + C \sum^l_{n=1} \max_{y \in Y} (\Delta(y_n,y) + w'\Psi(x_n, y) - w' \Psi(x_n,y_n)) | ||
+ | </math> | ||
− | + | where <math>\Delta</math> is an arbitrary function, which measures the distance between to labels and <math>\Psi</math> is a function on samples and labels, which extracts feature vectors. These two function are to be defined according to the problem at hand | |
+ | |||
+ | |||
+ | Since the equation above is non-differentiable, one can reformulate it introducing slack variables, <math>\xi_n</math>, representing the value of the maximum. Using this approach the SSVM comes as: | ||
<math> | <math> | ||
− | \ | + | \min_{w,\xi} ||w||^2 + C \sum^l_{n=1} \xi_n |
+ | </math> | ||
+ | |||
+ | <math> | ||
+ | s.t. w'\Psi(x_n,y_n) - w'\Psi(x_n,y) + \xi_n \ge \Delta (y_n,y), n= 1,...,l \forall y \in Y | ||
</math> | </math> | ||
<math> | <math> | ||
− | + | \xi_n \ge 0, n= 1,...,l | |
+ | </math> | ||
+ | |||
+ | == Testing == | ||
+ | Given a sample, <math>x \in X </math> and a mapping <math>f: X \rightarrow Y</math> one can obtain the correspondent lable. The mapping is defined as | ||
+ | |||
+ | |||
+ | <math> | ||
+ | f(x) = \arg\max_{y \in Y} w'\Psi(x,y) | ||
</math> | </math> | ||
+ | |||
+ | where <math>w</math> is the vector during the training phase. | ||
+ | |||
+ | The inference problem of solving for this maximizer over the label space is dependent on the structure of the function <math>\Psi</math> that varies according to the problem. | ||
+ | |||
== Related Papers == | == Related Papers == | ||
− | *[http://svmlight.joachims.org/svm_struct.html| | + | *[[Taskar, B. et al, NIPS 2003| Taskar
et
al.
(2003)]] |
− | * | + | *[http://svmlight.joachims.org/svm_struct.html| Tsochantaridis et al. (2004)] |
− | + | *[[Taskar et al. 2004. Max-margin Parsing| Taskar et al. (2004)]] | |
− | + | *[[Tsochantaridis et al, JMLR 2005| Tsochantaridis
et
al.
(2005)]] | |
− | |||
− | |||
− | |||
− | * |
Latest revision as of 16:38, 2 November 2011
The Method and When to Use it
Structured (or Structural) Support Vector Machines (SSVM), as the name states, is a machine learning model that generalizes the Support Vector Machine (SVM) classifier, allowing training a classifier for structured output.
In general, SSVMs perform supervised learning by approximating a mapping
where is a set of labeled training examples and is a complex structured object, like trees, sequences, or sets, instead of simple univariate predictions (as in the SVM case).
Thus, training a SSVM classifier consists of showing pairs of correct sample and output label pairs, that are used for training, allowing to predict for new sample instances the corresponding output label
In NLP one can fing a great variety of problems that rely on complex outputs, such as parsing and Markov Models for part-of-speech tagging.
Training
While on training, for a set of samples and labels , the SSVM minimizes the risk function:
where is an arbitrary function, which measures the distance between to labels and is a function on samples and labels, which extracts feature vectors. These two function are to be defined according to the problem at hand
Since the equation above is non-differentiable, one can reformulate it introducing slack variables, , representing the value of the maximum. Using this approach the SSVM comes as:
Testing
Given a sample, and a mapping one can obtain the correspondent lable. The mapping is defined as
where is the vector during the training phase.
The inference problem of solving for this maximizer over the label space is dependent on the structure of the function that varies according to the problem.