Empirical Risk Minimization

From Cohen Courses
Revision as of 15:38, 1 November 2011 by Yww (talk | contribs)
Jump to navigationJump to search

This is a method proposed by Bahl et al. 1988 A new algorithm for the estimation of hidden Markov model parameters.

In graphical models, true distribution of the data is always not known. Instead of maximizing the likelihood on training data when estimating the model parameter , we can minimize the Empirical Risk Minimization (ERM) by averaging loss . ERM was widely used in Speech Recognition (Bahl et al., 1988) and Machine Translation (Och, 2003). The ERM estimation method has the following advantages:

  • Maximum likelihood does not guarantee better accuracy, but might overfit to the training distribution. ERM can prevent overfitting the training data.
  • Summing up and averaging the local conditional likelihood might be more resilient to errors than calculating the product of conditional likelihoods.

Motivation

Problem Formulation

Empirical Risk Minimization

Some Reflections

Related Papers

  • ()
  • ()
  • ()
  • ()