Difference between revisions of "Structured Ensemble Cascades"
Line 32: | Line 32: | ||
<math> \inf_{\theta_1,...,\theta_P} \frac{\lambda}{2}||\sum_p\theta_p||^2 + \frac{1}{n} \sum_i H(\theta; (x^i, y^i))</math> | <math> \inf_{\theta_1,...,\theta_P} \frac{\lambda}{2}||\sum_p\theta_p||^2 + \frac{1}{n} \sum_i H(\theta; (x^i, y^i))</math> | ||
+ | |||
+ | Sub-gradient descent is used to find an optimal <math>\theta</math> and each model is updated only when a mistake has been made jointly. | ||
== Experimental Result == | == Experimental Result == |
Revision as of 21:00, 5 October 2011
This method as proposed by Weiss et al, NIPS 2010
This page is reserved for a write up by Dan Howarth
Contents
Citation
Sidestepping Intractable Inference with Structured Ensemble Cascades. David Weiss, Benjamin Sapp, and Ben Taskar. Neural Information Processing Systems (NIPS), December 2010.
Online version
Summary
This work introduces a method for intractable inference by "sidestepping" the inference all together by learning a group of sub-models in a structured prediction cascade. For instance, inference on loopy graphical models is intractable. This method overcomes this intractability by splitting the model up into submodels that are loop-less. This builds on the authors previous work of structured prediction cascades where intractable models are learned by learning increasingly complex models while also progressively pruning the set of possible outputs. See structured prediction cascades for an more information about this method.
Brief description of the method
See the description of structured prediction cascades before continuing. The notation used there is the same here.
The method here is basically the same except that instead of having a single model for each level, there are sub-models that need to be taken into account at each level.
At each level the score of the overall model is defined by the sum of the sub-models: . The max marginals are defined similarly:
As in SPC, the that are not pruned are those whose max marginals are above a threshold function. The threshold function is the sum of the threshold functions for each model (as defined in SPC):
Note that as in previous methods such as dual decomposition it is not necessary that all sub-models agree on the argmax solution. This allows structured ensemble cascades to enjoy only a linear (factor of ) increase of inference time.
The optimization function that is learned is then the same as SCP with all models taken into account for smoothing:
Sub-gradient descent is used to find an optimal and each model is updated only when a mistake has been made jointly.