Difference between revisions of "Stoyanov et al 2011: Empirical Risk Minimization of Graphical Model Parameters Given Approximate Inference, Decoding, and Model Structure"

From Cohen Courses
Jump to navigationJump to search
Line 19: Line 19:
 
Assume the task is to do ERM estimation to obtain the model parameter <math>\theta</math>. The standard maximum log-likelihood is
 
Assume the task is to do ERM estimation to obtain the model parameter <math>\theta</math>. The standard maximum log-likelihood is
 
: <math>\theta^{*} = \underset{\theta}{\operatorname{argmax}} Log L(\theta) = \underset{\theta}{\operatorname{argmax}} \sum_{i} log p_{\theta} (x_{i}, y_{i}) </math>
 
: <math>\theta^{*} = \underset{\theta}{\operatorname{argmax}} Log L(\theta) = \underset{\theta}{\operatorname{argmax}} \sum_{i} log p_{\theta} (x_{i}, y_{i}) </math>
 +
Instead of doing MLE training, the authors estimate the parameter <math>\theta</math> from an empirical risk function
 +
: <math>\! ER(\theta) = \frac{1}{n} \sum_{i=1}^n L(f_{\theta}(x_i), y_i).</math>
 
   
 
   
 
.
 
.

Revision as of 00:38, 3 November 2011

Citation

Veselin Stoyanov and Alexander Ropson and Jason Eisner, "Empirical Risk Minimization of Graphical Model Parameters Given Approximate Inference, Decoding, and Model Structure", in Proceedings of AISTATS, 2011.

Online version

Stoyanov et al 2011

Summary

This is an interesting paper that presents a loopy Belief Propagation and Back Propagation method for Empirical Risk Minimization (ERM), which is an alternative training method for general problems in Probabilistic Graphical Models (e.g. possible applications include Named Entity Recognition, Word Alignment, Shallow Parsing, and Constituent Parsing). The paper formulates the approximate learning problem as an ERM problem, rather than MAP estimation. The authors show that by replacing MAP estimation, the ERM based estimation parameters significantly reduce loss on the test set, even by an order of magnitude.

Brief Description of the method

This paper first formulates the parameter estimation problem as training and decoding on Markov random fields (MRFs), then discusses the use of Belief Propagation to do inference on MRFs and the use of Back Propagation to calculate the gradient of the empirical risk. In this section, we will first summarize the Back Propagation method they use to compute the gradient of the empirical risk, then briefly describe the numerical optimization method for this task. Regarding the detailed Belief Propagation and Empirical Risk Minimization methods for general probabilistic graphical models, please refer to their corresponding method page.

Back-Propagation

Assume the task is to do ERM estimation to obtain the model parameter . The standard maximum log-likelihood is

Instead of doing MLE training, the authors estimate the parameter from an empirical risk function

.

Dataset

Experimental Results

Related Papers

This paper is related to many papers in three dimensions.