Berg-Kirkpatrick et al, ACL 2010: Painless Unsupervised Learning with Features

Citation

T. Berg-Kirkpatrick, A. Bouchard-Côté, J. DeNero and D. Klein. Painless Unsupervised Learning with Features, Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the ACL, pp. 582-590, Los Angeles, June 2010.

Online Version

PDF version

Summary

This paper generalizes conventional HMMs to featurized HMMs, by replacing the multinomial conditional probability distributions (CPDs) with miniature log-linear models. Two algorithms for unsupervised training of featurized HMMs are proposed.

Featurized HMMs are applied to four unsupervised learning tasks:

POS induction (unsupervised version of POS tagging);
Grammar induction;
Word alignment;
Word segmentation.

For all these four tasks, featurized HMMs are shown to outperform their unfeaturized counterparts by a substantial margin.

Method

This paper proposes the concept of featurized HMMs and two algorithms for their unsupervised training. For a detailed elaboration, see the page Featurized HMMs.

The paper also comes up with featurized versions of other HMM-like models, e.g. Dependency Model with Valence for Grammar induction.

Experiments

POS Induction

POS induction is the unsupervised version of POS tagging. The output is clusters of words that the system believes to belong to the same part-of-speech. In order to evaluate the performance of a POS inductor, it is necessary to map the clusters to the actual POS tags. The best accuracy achieved by all the mapping is called the "many-1 accuracy".

Dataset	Penn Treebank English WSJ
Criterion	Many-1 accuracy (the larger, the better)
Baseline	63.1 ± 1.3 (HMM) (10 runs, mean ± standard deviation)
Performance of proposed systems	68.1 ± 1.7 (Featurized HMM, Algorithm 1) 75.5 ± 1.1 (Featurized HMM, Algorithm 2)
Performance of contrastive systems	59.6 ± 6.9 (Featurized MRF) [Haghighi and Klein, ACL 2006]

Grammar Induction

Dataset	English: Penn Treebank English WSJ Chinese: Penn Treebank Chinese
Criterion	Accuracy (the larger the better, not clear how it is defined)
Baseline	English 47.8, Chinese 42.5 (DMV)
Performance of proposed systems	English 48.3, Chinese 49.9 (Featurized DMV, Algorithm 1) English 63.0, Chinese 53.6 (Featurized DMV, Algorithm 2)
Performance of contrastive systems	English 61.3, Chinese 51.9 [Cohen and Smith, ACL 2009]

Word Alignment

Dataset	NIST 2002 Chinese-English Development Set
Criterion	Alignment Error Rate (the smaller the better)
Baseline	38.0 (Model 1) [Brown et al, CL 1994] 33.8 (HMM) [Ney and Vogel, CL 1996]
Performance of proposed systems	35.6 (Featurized Model 1, Algorithm 1) 30.0 (Featurized HMM, Algorithm 1)

Word Segmentation

Dataset	Bernstein-Ratner Corpus
Criterion	Segment F1 score (the larger the better)
Baseline	76.9 ± 0.1 (Unigram) (10 runs, mean ± standard deviation)
Performance of proposed systems	84.5 ± 0.5 (Featurized Unigram, Algorithm 1) 88.0 ± 0.1 (Featurized Unigram, Algorithm 2)
Performance of contrastive systems	87 [Johnson and Goldwater, ACL 2009]

Berg-Kirkpatrick et al, ACL 2010: Painless Unsupervised Learning with Features

Contents

Citation

Online Version

Summary

Method

Experiments

POS Induction

Grammar Induction

Word Alignment

Word Segmentation

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools