Bartlett et al., ACL-HLT 2008. Automatic Syllabification with Structured SVMs for Letter-to-Phoneme Conversion

From Cohen Courses
Jump to navigationJump to search


Susan Bartlett, Grzegorz Kondrak and Colin Cherry. 2008. Automatic syllabification with structured SVMs for letter-to-phoneme conversion. In Proceedings of ACL-08: HLT, 2008, pp. 568–576.

Online Version

Automatic syllabification with structured SVMs for letter-to-phoneme conversion.


This paper describes one of the first successful attempts at integrating automatic syllabification into a letter-to-phoneme conversion system using structured SVMs. The authors obtain substantial improvements in reducing automatic syllabification error rate (measured in WER) against the then state-of-the-art approach. The authors model the problem as an orthographic syllabification task as opposed to phonological syllabification. They treat it as a sequence tagging problem and define new tagging schemes. The method is applied to languages such as German and Dutch, in addition to English.


Structured SVMs

Structured SVMs (Tsochantaridis et al., 2004) is a large-margin training method that is used for predicting structured outputs such as tag sequences. The method described in this paper uses structured SVMs that learn tag sequences from training data and perform structured output prediction by finding the highest scoring tag sequence using the Viterbi algorithm. Hence, the decoding problem in structured SVMs resembles that of an HMM.

Training structured SVMs is viewed as a multi-class classification problem. For a given training instance , a correct tag sequence is drawn from a set of possible tag sequences . Each input sequence has a feature space representation to represent a candidate output sequence .

The training process works in a manner that weights a correctly classified input sequence more than an incorrectly classified input sequence using a weight vector . It tries to the maximize the margin between the correct and incorrect sequence. Mathematically, the relationship can be expressed as:

The model then predicts the highest scoring sequence as follows,

The search for the highest confidence score is performed by using the Viterbi algorithm. The set of negative or incorrect sequences in is potentially exponential with respect to . Hence, structured SVMs address this problem by limiting the search space for potential output sequences with the help of an iterative online learning approach. This approach is outlined as follows:

1. Find out the least scoring or most damaging incorrect sequence according to the current weight vector .

2. Add to an iteratively increasing list of of incorrect sequences.

3. Update the weight vector according to the objective function using the partial sets in place of .

4. Go to next training instance and loop from 1 until convergence.

Tagging Scheme

Positional Tags

The authors use a modified NB tag scheme for syllabification which also takes into account the sequential position of tags for information about syllable length. They call this scheme the Numbered NB tag scheme.

Structural Tag

A modification to the ONC tag scheme for tagging vowels and consonants in phonemes of a syllable as onset, nucleus, or coda is introduced. The modified scheme also incorporates numbered positions of each of the tags.

The final tag set then combines positional and structural tags which encodes information about explicit syllable boundaries as well as the structure of syllables. This scheme is labeled the Break ONC tag scheme.

Experiments and Results


  • The authors report results on the CELEX and NETtalk corpora. CELEX employs a better syllabification strategy.
  • [[::NETtalk corpus | NETtalk]] contains 20k words. It is divided into a train/test set of 13k and 7k words respectively.
  • [[::celex corpus | CELEX]] corpus is divided into 14k words for training, 25k words for testing and 5k words as the development set.


The paper reports results on syllabification and grapheme-to-phoneme accuracies.


Syllabification performance is summarized in table 1 below. Word accuracy percentage for the grapheme to phoneme task with and without syllabification information is also given.

Syllabification svm.jpg G2p svm.jpg

Related Papers

[1] Ioannis Tsochantaridis,Thomas Hofmann, Thorsten Joachims, and Yasemin Altun. Support vector machine learning for interdependent and structured output spaces. In Proceedings of the 21st International Conference on Machine Learning, pages 104–111, July 2004.

[2] Yasemin Altun, Ioannis Tsochantaridis, and Thomas Hofmann. 2003. Hidden Markov support vector machines. In Proceedings of the 20th International Conference on Machine Learning (ICML), pages 3-10.