Difference between revisions of "Empirical Risk Minimization"

From Cohen Courses
Jump to navigationJump to search
Line 1: Line 1:
 
This is a [[Category::method]] proposed by [[RelatedPaper::Bahl et al. 1988 A new algorithm for the estimation of hidden Markov model parameters]].
 
This is a [[Category::method]] proposed by [[RelatedPaper::Bahl et al. 1988 A new algorithm for the estimation of hidden Markov model parameters]].
  
In graphical models, Empirical Risk Minimization is an interesting training method that does not aim at maximizing the likelihood on training data. It has the following advantages:  
+
In graphical models, true distribution of the data is always not known. Instead of maximizing the likelihood on training data when estimating the model parameter <math>\theta</math>, we can minimize the Empirical Risk Minimization (ERM) by averaging loss <math>l</math>. ERM was widely used in Speech Recognition (Bahl et al., 1988) and Machine Translation (Och, 2003). The ERM estimation method has the following advantages:  
* It might prevent overfitting the training data.
+
* Maximum likelihood does not guarantee better accuracy, but might overfit to the training distribution. ERM can prevent overfitting the training data.
* Summing up the local conditional likelihood might be more resilient to errors than calculating the product of conditional likelihoods.
+
* Summing up and averaging the local conditional likelihood might be more resilient to errors than calculating the product of conditional likelihoods.
  
 
== Motivation ==
 
== Motivation ==

Revision as of 16:38, 1 November 2011

This is a method proposed by Bahl et al. 1988 A new algorithm for the estimation of hidden Markov model parameters.

In graphical models, true distribution of the data is always not known. Instead of maximizing the likelihood on training data when estimating the model parameter , we can minimize the Empirical Risk Minimization (ERM) by averaging loss . ERM was widely used in Speech Recognition (Bahl et al., 1988) and Machine Translation (Och, 2003). The ERM estimation method has the following advantages:

  • Maximum likelihood does not guarantee better accuracy, but might overfit to the training distribution. ERM can prevent overfitting the training data.
  • Summing up and averaging the local conditional likelihood might be more resilient to errors than calculating the product of conditional likelihoods.

Motivation

Problem Formulation

Empirical Risk Minimization

Some Reflections

Related Papers

  • ()
  • ()
  • ()
  • ()