Difference between revisions of "Structured Ensemble Cascades"

From Cohen Courses
Jump to navigationJump to search
Line 36: Line 36:
  
 
== Experimental Result ==
 
== Experimental Result ==
[[File:Weiss_et_al_NIPS_2010_A.png‎|200px]]
+
[[File:Weiss_et_al_NIPS_2010_A.png‎|250px]]
  
 
== Related papers ==
 
== Related papers ==

Revision as of 22:11, 5 October 2011

This method as proposed by Weiss et al, NIPS 2010

This page is reserved for a write up by Dan Howarth


Citation

Sidestepping Intractable Inference with Structured Ensemble Cascades. David Weiss, Benjamin Sapp, and Ben Taskar. Neural Information Processing Systems (NIPS), December 2010.

Online version

[1]

Summary

This work introduces a method for intractable inference by "sidestepping" the inference all together by learning a group of sub-models in a structured prediction cascade. For instance, inference on loopy graphical models is intractable. This method overcomes this intractability by splitting the model up into submodels that are loop-less. This builds on the authors previous work of structured prediction cascades where intractable models are learned by learning increasingly complex models while also progressively pruning the set of possible outputs. See structured prediction cascades for an more information about this method.

Brief description of the method

See the description of structured prediction cascades before continuing. The notation used there is the same here.

The method here is basically the same except that instead of having a single model for each level, there are sub-models that need to be taken into account at each level.

At each level the score of the overall model is defined by the sum of the sub-models: . The max marginals are defined similarly:

As in SPC, the that are not pruned are those whose max marginals are above a threshold function. The threshold function is the sum of the threshold functions for each model (as defined in SPC):

Note that as in previous methods such as dual decomposition it is not necessary that all sub-models agree on the argmax solution. This allows structured ensemble cascades to enjoy only a linear (factor of ) increase of inference time.

The optimization function that is learned is then the same as SCP with all models taken into account for smoothing:


Sub-gradient descent is used to find an optimal and each model is updated only when a mistake has been made jointly.

Experimental Result

Weiss et al NIPS 2010 A.png

Related papers