Empirical Risk Minimization

This is a method proposed by Bahl et al. 1988 A new algorithm for the estimation of hidden Markov model parameters.

In graphical models, true distribution of the data is always not known. Instead of maximizing the likelihood on training data when estimating the model parameter $\theta$ , we can minimize the Empirical Risk Minimization (ERM) by averaging loss $l$ . ERM was widely used in Speech Recognition (Bahl et al., 1988) and Machine Translation (Och, 2003). The ERM estimation method has the following advantages:

Maximum likelihood does not guarantee better accuracy, but might overfit to the training distribution. ERM can prevent overfitting the training data.
Summing up and averaging the local conditional likelihood might be more resilient to errors than calculating the product of conditional likelihoods.

Motivation

Problem Formulation

Empirical Risk Minimization

Some Reflections

Related Papers

()

()

()

()

Empirical Risk Minimization

Contents

Motivation

Problem Formulation

Empirical Risk Minimization

Some Reflections

Related Papers

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools