Difference between revisions of "Structured Prediction Cascades"

Revision as of 21:25, 4 October 2011

This method as proposed by Weiss et al, AISTATS 2010

This page is reserved for a write up by Dan Howarth

Citation

Structured Prediction Cascades. David Weiss and Ben Taskar. International Conference on Artificial Intelligence and Statistics (AISTATS), May 2010.

Online version

[1]

Summary

In many structured prediction models an increase in model complexity comes at a high computational cost. For example, the complexity of a HMM grows exponentially with the order of the model. This work introduces a method for learning increasingly complex models while continually pruning the possible output space. This is done by "weeding" out the incorrect output states early on.

Previous methods to solve the problem of model complexity were commonly approximate search methods or heuristic pruning techniques. Structured prediction cascades are different however because they explicitly learn the error/computation trade off for each increase in model complexity.

In this work structured prediction cascades are applied to handwriting recognition and POS tagging.

Brief description of the method

The linear hypothesis class considered is of the form: $h_{w}(x)=\arg \max _{y\in {\mathcal {Y}}}\sum _{c\in {\mathcal {C}}}w^{\mathrm {T} }f_{c}(x,y_{c})$ where ${\mathcal {C}}$ is the set of cliques.

At each level of the cascade the method is given the set of possible clique assignments as input, and each level then filters this set and passes the filtered set as input to the next level.

${\mathcal {Y}}_{c}$ is the set of possible output assignments to clique $c$

Let the following be defined for the $i$ 'th model:

${\mathcal {C}}^{i}$ , the set of maximal cliques

${\mathcal {V}}_{c}^{i}\subseteq {\mathcal {Y}}_{c}$ , the set of valid output assignments for clique $c$

Before pruning, ${\mathcal {V}}_{c}^{i}=\{y_{c}\in {\mathcal {Y}}_{c}|\forall c'\in {\mathcal {C}}^{i-1},c'\subseteq c,y_{c^{i}}\in {\mathcal {V}}_{c^{i}}^{i-1}\}$ , or the set of assignments to the cliques in level $i$ that each contain as a subset a valid assignment in the $i-1$ 'th level

Any $y_{c}\in {\mathcal {V}}_{c}^{i}$ is then pruned if their max-marginal score is less than a threshold.

The threshold is the function $t_{x}(\alpha )=\alpha \theta _{x}^{*}+(1-\alpha ){\frac {1}{|V|}}\sum _{c\in {\mathcal {C}},y_{c}\in {\mathcal {V}}_{c}}\theta _{x}^{*}(y_{c})$ where

$\alpha$ determines the number of max marginals eliminated

$\theta _{x}(y)=w^{\mathrm {T} }f(x,y)$

$\theta _{x}^{*}(y_{c})=\max _{y'\in {\mathcal {Y}}}\{\theta _{x}(y'):y'_{c}=y_{c}\}$ , the score of best possible output that contains the assignments $y_{c}$

$\theta _{x}^{*}=max_{y}\theta _{x}(y)$

@@ Line 44: / Line 44: @@
 <math>\theta_x(y) = w^\mathrm{T}f(x,y)</math>
-<math>\theta^*_x(y_c) = \max_{y'\in \mathcal{Y}} \{ \theta_x(y') : y'_c = y_c\}</math>, the best possible output that contains the assignments <math>y_c</math>
+<math>\theta^*_x(y_c) = \max_{y'\in \mathcal{Y}} \{ \theta_x(y') : y'_c = y_c\}</math>, the score of best possible output that contains the assignments <math>y_c</math>
 <math>\theta^*_x = max_y \theta_x(y)</math>

Difference between revisions of "Structured Prediction Cascades"

Revision as of 21:25, 4 October 2011

Contents

Citation

Online version

Summary

Brief description of the method

Experimental Result

Related papers

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools