Haghighi and Klein, ACL 2006: Prototype-Driven Learning for Sequence Models

Citation

A. Haghighi and D. Klein. Prototype-Driven Learning for Sequence Models, Proceedings of the Human Language Technology Conference of the North American Chapter of the ACL, pp. 320-327, New York, June 2006.

Online Version

PDF version

Summary

This paper addresses the problem of POS tagging in both English and Chinese, and the problem of field segmentation in the domain of classified advertisements. The latter is also addressed in Grenager et al, ACL 2005.

Model: Markov Random Field

The modeling tool used in this paper is Markov random field (MRF). This is the generative version of conditional random field (CRF): an MRF defines a joint distribution over the states and observations, whereas a CRF defines a conditional distribution over the observations given the states.

For POS tagging, the states are chosen as pairs of POS tags. For field segmentation, the states are field labels, as in Grenager et al, ACL 2005.

A difficulty that arises in the training of the MRF is that the sequence length is unconstrained. The authors set a maximum length and sum over all sequences within this length.

For decoding, the authors use maximum posterior decoding (at each position choosing the label which has the highest posterior probability, obtained from the forward-backward algorithm) instead of Viterbi decoding. The former is found to be "uniformly but slightly superior" to the latter.

Prototype-Driven Learning

As is pointed out in Grenager et al, ACL 2005, pure unconstrained unsupervised learning doesn't learn very well because of the existence of multiple levels of structure in training documents. As a solution, this paper advocates prototype-driven learning. This can be considered as a form of semi-supervised learning, but unlike conventional semi-supervised learning where a portion of the training documents are fully labeled, in prototype-driven learning, a list of "prototype words" is provided for each label. This requires less human effort than conventional semi-supervised learning.

Two ways to utilize the prototype words are discussed. The first way is relatively simple: assigning to the prototype words their respective labels in the training data. While this increases the overall accuracy significantly, it does not increase the accuracy for non-prototype words a lot. This indicates that "the prototype information is not spreading to non-prototype words."

In order to make non-prototype words benefit from the similar prototype words, a "distributional similarity feature" is devised based on word context. Words that have a similar context distribution with a prototype word z activate a feature "PROTO = z", so they are "pushed toward" the label of the prototype word. This significantly boosts the accuracy both overall and for non-prototype words. The similarity feature is designed differently to capture the different level of desired structure: low-level for POS tagging vs high-level for field segmentation.

Experiments

Dataset

For English POS tagging: Penn Treebank English WSJ (Test set contains 193K tokens, 8K sentences)
For Chinese POS tagging: Penn Treebank Chinese (Test set contains 60K tokens)
For field segmentation: Classified advertisements for apartment rental on Craigslist (See Grenager et al, ACL 2005)

Criterion

The criterion used is per-token accuracy.

Main Results

The following table reports a part of the results that are closely related to the techniques introduced above.

	English POS Tagging		Chinese POS Tagging (Overall)	Field Segmentation
	Overall	Non-prototype words	Chinese POS Tagging (Overall)	Field Segmentation
BASELINE	41.3		34.4	46.4
PROTO	68.8	47.7	39.0	53.7
PROTO + SIM	80.5	67.8	57.4	71.5

BASELINE denotes an MRF with some token and label features but without prototypes.
PROTO denotes only fixing the labels of prototype words.
PROTO + SIM denotes also incorporating the distribution similarity features.

Haghighi and Klein, ACL 2006: Prototype-Driven Learning for Sequence Models

Contents

Citation

Online Version

Summary

Model: Markov Random Field

Prototype-Driven Learning

Experiments

Dataset

Criterion

Main Results

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools