Difference between revisions of "Lehnen et al., ICASSP 2011. Incorporating Alignments into Conditional Random Fields for Grapheme to Phoneme Conversion"
(→Method) |
(→Method) |
||
| Line 42: | Line 42: | ||
The expectation step is given by, | The expectation step is given by, | ||
| − | \hat{a}_1^M = | + | \hat{a}_1^M = argmax_{a_1^M} \left\{ p(t_1^N(T_1^M,a_1^M)|s_1^N) \right\} |
== Experiments and Results == | == Experiments and Results == | ||
== Related Papers == | == Related Papers == | ||
Revision as of 22:40, 24 September 2011
Contents
Citation
Patrick Lehnen, Stefan Hahn, Andreas Guta and Hermann Ney. 2011. Incorporating Alignments into Conditional Random Fields for Grapheme to Phoneme Conversion. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP-2011.
Online Version
Incorporating Alignments into Conditional Random Fields for Grapheme to Phoneme Conversion
Summary
The authors present a novel approach for better grapheme to phoneme (g2p) conversion. They argue that alignments are crucial in g2p conversion and are usually added by external models. Thus, the authors introduce an approach by which the alignment generation step can be efficiently added into the CRF training process. This is achieved in two ways. One in which linear segmentation is considered and the other in which all possible alignments given some constraints are incorporated in the CRF model. Apart from the standard CRF training process, the authors also introduce alignment as a hidden variable in the model.
Method
A conditional random field is modeled as:
Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle p(t_1^N|s_1^N) = \frac {\exp H(t_1^N, s_1^N)}{\sum_{\tilde{t}_1^N}\exp H({\tilde{t}_1^N}, s_1^N)} }
Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \text{where, } H(t_1^N, s_1^N) = \left( \sum_{n=1}^N \sum_{l=1}^L \lambda_l h_l(t_{n-1}, t_n, s_1^N) \right) }
Alignments
The authors add alignment by modeling it as a hidden variable, Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle a_1^M} in CRFs as follows,
Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle p(t_1^M|s_1^M) = \sum_{a_1^M}p(t_1^M, a_1^M|s_1^N) }
They model the tuple Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle (t_1^M, a_1^M) } by a projection using the BIO labeling scheme, allowing for a 1-to-1 or many-to-one monotonic alignment scheme.
Training
The CRF model incorporating alignment as a hidden variable can be trained in two ways,
- Maximization approach
- Summation approach
Maximization Approach
This approach assumes a linear segmentation at the beginning and trains the CRF using an Expectation-Maximization like algorithm. The maximization step of the training process is given by,
Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle p(t_1^N|s_1^N)|_{t_1^N=t_1^N(T_1^M,a_1^M)} = \frac {\exp H(t_1^N, s_1^N)}{\sum_{\tilde{t}_1^N}\exp H({\tilde{t}_1^N}, s_1^N)} }
The expectation step is given by, \hat{a}_1^M = argmax_{a_1^M} \left\{ p(t_1^N(T_1^M,a_1^M)|s_1^N) \right\}