Difference between revisions of "Structured Ensemble Cascades"

From Cohen Courses
Jump to navigationJump to search
Line 24: Line 24:
 
At each level the score of the overall model <math>\theta(x,y)</math> is defined by the sum of the sub-models: <math>\theta(x,y) = \sum_p \theta_p(x,y) </math>. The max marginals are defined similarly: <math>\theta^*(x,y_j) = \sum_p \theta^*_p(x, y_j)</math>
 
At each level the score of the overall model <math>\theta(x,y)</math> is defined by the sum of the sub-models: <math>\theta(x,y) = \sum_p \theta_p(x,y) </math>. The max marginals are defined similarly: <math>\theta^*(x,y_j) = \sum_p \theta^*_p(x, y_j)</math>
  
As in SPC, <math>y_j</math> that are not pruned are those whose max marginals are above a threshold function. The threshold function is the sum of the threshold functions for each model (as defined in SPC): <math>t(x,\alpha) = \sum_p t_p(x, \alpha)</math>
+
As in SPC, the <math>y_j</math> that are not pruned are those whose max marginals are above a threshold function. The threshold function is the sum of the threshold functions for each model (as defined in SPC): <math>t(x,\alpha) = \sum_p t_p(x, \alpha)</math>
 +
 
 +
Note that as in previous methods such as [[dual decomposition]] it is not necessary that all sub-models agree on the argmax solution. This allows structured ensemble cascades to enjoy only a linear (factor of <math>P</math>) increase of inference time.
 +
 
 +
The optimization function that is learned is then the same as SCP with all models taken into account for smoothing:
 +
 
 +
 
 +
<math> \inf_{\theta_1,...,\theta_P} \frac{\lambda}{2}||\sum_p\theta_p||^2 + \frac{1}{n} \sum_i H(\theta; (x^i, y^i))</math>
 +
 
 +
and
 +
 
 +
<math>H(\theta;(x^i, y^i)) = max\{0, l + \sum_p t_p(x^i,\alpha) - \theta^\mathrm{T}f(x^i, y^i)\}</math>
  
 
== Experimental Result ==
 
== Experimental Result ==

Revision as of 20:57, 5 October 2011

This method as proposed by Weiss et al, NIPS 2010

This page is reserved for a write up by Dan Howarth


Citation

Sidestepping Intractable Inference with Structured Ensemble Cascades. David Weiss, Benjamin Sapp, and Ben Taskar. Neural Information Processing Systems (NIPS), December 2010.

Online version

[1]

Summary

This work introduces a method for intractable inference by "sidestepping" the inference all together by learning a group of sub-models in a structured prediction cascade. For instance, inference on loopy graphical models is intractable. This method overcomes this intractability by splitting the model up into submodels that are loop-less. This builds on the authors previous work of structured prediction cascades where intractable models are learned by learning increasingly complex models while also progressively pruning the set of possible outputs. See structured prediction cascades for an more information about this method.

Brief description of the method

See the description of structured prediction cascades before continuing. The notation used there is the same here.

The method here is basically the same except that instead of having a single model for each level, there are sub-models that need to be taken into account at each level.

At each level the score of the overall model is defined by the sum of the sub-models: . The max marginals are defined similarly:

As in SPC, the that are not pruned are those whose max marginals are above a threshold function. The threshold function is the sum of the threshold functions for each model (as defined in SPC):

Note that as in previous methods such as dual decomposition it is not necessary that all sub-models agree on the argmax solution. This allows structured ensemble cascades to enjoy only a linear (factor of ) increase of inference time.

The optimization function that is learned is then the same as SCP with all models taken into account for smoothing:


and

Experimental Result

Related papers