Difference between revisions of "Structured Ensemble Cascades"

Revision as of 21:00, 5 October 2011

This method as proposed by Weiss et al, NIPS 2010

This page is reserved for a write up by Dan Howarth

Citation

Sidestepping Intractable Inference with Structured Ensemble Cascades. David Weiss, Benjamin Sapp, and Ben Taskar. Neural Information Processing Systems (NIPS), December 2010.

Online version

[1]

Summary

This work introduces a method for intractable inference by "sidestepping" the inference all together by learning a group of sub-models in a structured prediction cascade. For instance, inference on loopy graphical models is intractable. This method overcomes this intractability by splitting the model up into submodels that are loop-less. This builds on the authors previous work of structured prediction cascades where intractable models are learned by learning increasingly complex models while also progressively pruning the set of possible outputs. See structured prediction cascades for an more information about this method.

Brief description of the method

See the description of structured prediction cascades before continuing. The notation used there is the same here.

The method here is basically the same except that instead of having a single model for each level, there are $P$ sub-models that need to be taken into account at each level.

At each level the score of the overall model $\theta (x,y)$ is defined by the sum of the sub-models: $\theta (x,y)=\sum _{p}\theta _{p}(x,y)$ . The max marginals are defined similarly: $\theta ^{*}(x,y_{j})=\sum _{p}\theta _{p}^{*}(x,y_{j})$

As in SPC, the $y_{j}$ that are not pruned are those whose max marginals are above a threshold function. The threshold function is the sum of the threshold functions for each model (as defined in SPC): $t(x,\alpha )=\sum _{p}t_{p}(x,\alpha )$

Note that as in previous methods such as dual decomposition it is not necessary that all sub-models agree on the argmax solution. This allows structured ensemble cascades to enjoy only a linear (factor of $P$ ) increase of inference time.

The optimization function that is learned is then the same as SCP with all models taken into account for smoothing:

$\inf _{\theta _{1},...,\theta _{P}}{\frac {\lambda }{2}}||\sum _{p}\theta _{p}||^{2}+{\frac {1}{n}}\sum _{i}H(\theta ;(x^{i},y^{i}))$

Sub-gradient descent is used to find an optimal $\theta$ and each model is updated only when a mistake has been made jointly.

@@ Line 32: / Line 32: @@
 <math> \inf_{\theta_1,...,\theta_P} \frac{\lambda}{2}||\sum_p\theta_p||^2 + \frac{1}{n} \sum_i H(\theta; (x^i, y^i))</math>
+Sub-gradient descent is used to find an optimal <math>\theta</math> and each model is updated only when a mistake has been made jointly.
 == Experimental Result ==

Difference between revisions of "Structured Ensemble Cascades"

Revision as of 21:00, 5 October 2011

Contents

Citation

Online version

Summary

Brief description of the method

Experimental Result

Related papers

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools