Berg-Kirkpatrick et al, ACL 2010: Painless Unsupervised Learning with Features
T. Berg-Kirkpatrick, A. Bouchard-Côté, J. DeNero and D. Klein. Painless Unsupervised Learning with Features, Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the ACL, pp. 582-590, Los Angeles, June 2010.
This paper generalizes conventional HMMs to featurized HMMs, by replacing the multinomial conditional probability distributions (CPDs) with miniature log-linear models. Two algorithms for unsupervised training of featurized HMMs are proposed.
Featurized HMMs are applied to four unsupervised learning tasks:
- POS induction (unsupervised version of POS tagging);
- Grammar induction;
- Word alignment;
- Word segmentation.
For all these four tasks, featurized HMMs are shown to outperform their unfeaturized counterparts by a substantial margin.
POS induction is the unsupervised version of POS tagging. The output is clusters of words that the system believes to belong to the same part-of-speech. In order to evaluate the performance of a POS inductor, it is necessary to map the clusters to the actual POS tags. The best accuracy achieved by all the mapping is called the "many-1 accuracy".
|Dataset||Penn Treebank English WSJ|
|Criterion||Many-1 accuracy (the larger, the better)|
|Baseline||63.1 ± 1.3 (HMM) (10 runs, mean ± standard deviation)|
|Performance of proposed systems||68.1 ± 1.7 (Featurized HMM, Algorithm 1) |
75.5 ± 1.1 (Featurized HMM, Algorithm 2)
|Performance of contrastive systems||59.6 ± 6.9 (Featurized MRF) [Haghighi and Klein, ACL 2006]|
|Dataset||English: Penn Treebank English WSJ |
Chinese: Penn Treebank Chinese
|Criterion||Accuracy (the larger the better, not clear how it is defined)|
|Baseline||English 47.8, Chinese 42.5 (DMV)|
|Performance of proposed systems||English 48.3, Chinese 49.9 (Featurized DMV, Algorithm 1) |
English 63.0, Chinese 53.6 (Featurized DMV, Algorithm 2)
|Performance of contrastive systems||English 61.3, Chinese 51.9 [Cohen and Smith, ACL 2009]|
|Dataset||NIST 2002 Chinese-English Development Set|
|Criterion||Alignment Error Rate (the smaller the better)|
|Baseline||38.0 (Model 1) [Brown et al, CL 1994] |
33.8 (HMM) [Ney and Vogel, CL 1996]
|Performance of proposed systems||35.6 (Featurized Model 1, Algorithm 1) |
30.0 (Featurized HMM, Algorithm 1)
|Criterion||Segment F1 score (the larger the better)|
|Baseline||76.9 ± 0.1 (Unigram) (10 runs, mean ± standard deviation)|
|Performance of proposed systems||84.5 ± 0.5 (Featurized Unigram, Algorithm 1) |
88.0 ± 0.1 (Featurized Unigram, Algorithm 2)
|Performance of contrastive systems||87 [Johnson and Goldwater, ACL 2009]|