Difference between revisions of "Gunawardana et al, ICSCT 2005: Hidden Conditional Random Fields for Phone Classification"

From Cohen Courses
Jump to navigationJump to search
(Created page with '== Citation == A. Gunawardana, M. Mahajan, A. Acero, J. C. Platt. '''Hidden conditional random fields for phone classification''', ''International Conference on Speech Communica…')
 
Line 8: Line 8:
  
 
== Summary ==
 
== Summary ==
 +
 +
This [[Category::paper]] addresses the problem of [[AddressesProblem::phone classification]]: given a sequence of acoustic features (observation vectors), predict the most probable phone. Each phone is modeled as a sequence of states, and the states emit observation vectors according to a Gaussian mixture model (GMM). Therefore the problem involves two latent variables: the state and mixture component at each time frame.
 +
 +
This problem is usually solved with [[UsesMethod::HMM|Hidden Markov Models]] (HMM). Traditionally HMMs are trained for maximum likelihood (ML); "recently" there has been success in discriminative training with objective functions such as maximum mutual information (MMI) and minimum phone error (MPE). But discriminative training for generative models like HMMs requires special algorithms like the [[UsesMethod::Extended Baum-Welch]] (EBW) algorithm. The authors propose a [[UsesMethod::Hidden Conditional Random Field]] (HCRF) model which can be trained with general-purpose optimization algorithms such as [[UsesMethod::L-BFGS]] and [[UsesMethod::Stochastic Gradient Descent]] (SGD).
 +
 +
=== Hidden Conditional Random Fields (HCRF) ===
 +
 +
For simplicity, the HCRF is first formulated for single Gaussian emission distributions and scalar observations. This eliminates the hidden variable of "component mixture" and leaves only the state. The HCRF gives the conditional probability of a phone label <math>w</math> given a sequence of observations <math>\mathbf{o} = (o_1, \ldots, o_t)</math>:
 +
 +
<math>p(w|\mathbf{o};\lambda) = \frac{1}{z(\mathbf{o};\lambda)} \sum_{\mathbf{s} \in w} \exp \{\lambda \cdot f(w,\mathbf{s},\mathbf{o}) \}</math>
 +
 +
where <math>\mathbf{s}</math> is the hidden state sequence, <math>f</math> is the feature vector, <math>\lambda</math> is the weight vector, and <math>z(\mathbf{o}, \lambda) = \sum_{w, \mathbf{s} \in w} \exp \{\lambda \cdot f(w,\mathbf{s},\mathbf{o}) \}</math> is the partition function.
 +
 +
=== Relationship to HMMs ===
 +
 +
The features <math>f</math> used in this paper include language model features (prior probabilities of the phones), transition counts between pairs of states, occupancy features of single states, and first- and second-order moments of the observations for each state:
 +
 +
[[File:HCRF_features.png]]
 +
 +
It can be proved that if the weight vector is set in the following way, the HCRF is equivalent to a ML-trained HMM:
 +
 +
[[File:HCRF_weights.png]]
 +
 +
But HMMs can only represent a subset of the conditional probabilities HCRFs can represent, due to their local normalization constraints.
  
 
== Experiments ==
 
== Experiments ==

Revision as of 21:52, 22 November 2011

Citation

A. Gunawardana, M. Mahajan, A. Acero, J. C. Platt. Hidden conditional random fields for phone classification, International Conference on Speech Communication and Technology, pp. 1117-1120, September 2005.

Online Version

PDF version

Summary

This paper addresses the problem of phone classification: given a sequence of acoustic features (observation vectors), predict the most probable phone. Each phone is modeled as a sequence of states, and the states emit observation vectors according to a Gaussian mixture model (GMM). Therefore the problem involves two latent variables: the state and mixture component at each time frame.

This problem is usually solved with Hidden Markov Models (HMM). Traditionally HMMs are trained for maximum likelihood (ML); "recently" there has been success in discriminative training with objective functions such as maximum mutual information (MMI) and minimum phone error (MPE). But discriminative training for generative models like HMMs requires special algorithms like the Extended Baum-Welch (EBW) algorithm. The authors propose a Hidden Conditional Random Field (HCRF) model which can be trained with general-purpose optimization algorithms such as L-BFGS and Stochastic Gradient Descent (SGD).

Hidden Conditional Random Fields (HCRF)

For simplicity, the HCRF is first formulated for single Gaussian emission distributions and scalar observations. This eliminates the hidden variable of "component mixture" and leaves only the state. The HCRF gives the conditional probability of a phone label given a sequence of observations :

where is the hidden state sequence, is the feature vector, is the weight vector, and is the partition function.

Relationship to HMMs

The features used in this paper include language model features (prior probabilities of the phones), transition counts between pairs of states, occupancy features of single states, and first- and second-order moments of the observations for each state:

HCRF features.png

It can be proved that if the weight vector is set in the following way, the HCRF is equivalent to a ML-trained HMM:

HCRF weights.png

But HMMs can only represent a subset of the conditional probabilities HCRFs can represent, due to their local normalization constraints.

Experiments