Difference between revisions of "Empirical Risk Minimization"
From Cohen Courses
Jump to navigationJump to searchLine 1: | Line 1: | ||
This is a [[Category::method]] proposed by [[RelatedPaper::Bahl et al. 1988 A new algorithm for the estimation of hidden Markov model parameters]]. | This is a [[Category::method]] proposed by [[RelatedPaper::Bahl et al. 1988 A new algorithm for the estimation of hidden Markov model parameters]]. | ||
− | In graphical models, true distribution of the data is always | + | In graphical models, true distribution of the data is always unknown in practice. Instead of maximizing the likelihood on training data when estimating the model parameter <math>\theta</math>, we can minimize the Empirical Risk Minimization (ERM) by averaging loss <math>l</math>. ERM was widely used in Speech Recognition (Bahl et al., 1988) and Machine Translation (Och, 2003). The ERM estimation method has the following advantages: |
+ | |||
* Maximum likelihood does not guarantee better accuracy, but might overfit to the training distribution. ERM can prevent overfitting the training data. | * Maximum likelihood does not guarantee better accuracy, but might overfit to the training distribution. ERM can prevent overfitting the training data. | ||
* Summing up and averaging the local conditional likelihood might be more resilient to errors than calculating the product of conditional likelihoods. | * Summing up and averaging the local conditional likelihood might be more resilient to errors than calculating the product of conditional likelihoods. |
Revision as of 15:38, 1 November 2011
This is a method proposed by Bahl et al. 1988 A new algorithm for the estimation of hidden Markov model parameters.
In graphical models, true distribution of the data is always unknown in practice. Instead of maximizing the likelihood on training data when estimating the model parameter , we can minimize the Empirical Risk Minimization (ERM) by averaging loss . ERM was widely used in Speech Recognition (Bahl et al., 1988) and Machine Translation (Och, 2003). The ERM estimation method has the following advantages:
- Maximum likelihood does not guarantee better accuracy, but might overfit to the training distribution. ERM can prevent overfitting the training data.
- Summing up and averaging the local conditional likelihood might be more resilient to errors than calculating the product of conditional likelihoods.
Contents
Motivation
Problem Formulation
Empirical Risk Minimization
Some Reflections
Related Papers
- ()
- ()
- ()
- ()