Charniak and Johnson 2005
Contents
Citation
"Coarse-to-Fine n-Best Parsing and MaxEnt Discriminative Reranking", Eugene Charniak, Mark Johnson, ACL 2005
Online Version
An online version of this paper can be found here: [1]
Summary
The authors describe a method that uses discriminative reranking for parsing. It uses the standard Charniak parser to generate the 50 best parse trees for a given sentence. They use the 50 parse trees in a MaxEnt reranker. The results show a small improvement over the state of the art (at the time), but is a significantly more efficient.
Getting the -best parses
They first describe using a simple extension to the dynamic programming algorithm to find the best parse (i.e. Viterbi), and instead store the best parses. The problem with this approach is that it increases the storage space significantly, especially if states are split a lot.
However, the parser they used solves the space problem by using a "coarse-to-fine" approach. They do this by using a "crude" grammar that only contains CFG features and parent and neighbor labels, which contains few enough states to be able to use a simple bottom up dynamic programming approach. Because of the coarseness of this parse, they get a polynomial sized forest. Each edge is pruned based on marginal probabilities conditioned on the string , if the probability given by the following equation falls below a threshold (they use )
The fine-grain model is then applied, but due to the smaller candidate set, the space size is manageable. During evaluation, the -best paths are kept at each state.
They produce results on section 23 of the WSJ for various numbers of best parses.
n | 1 | 2 | 10 | 25 | 50 |
---|---|---|---|---|---|
-score | 0.897 | 0.914 | 0.948 | 0.960 | 0.968 |
They make the claim that it is only 2-3 times slower to do 50-best than 1-best.
Reranking the parses
After pulling the best 50 parses, they convert each parse into a feature vector, where:
And each produces a real number. The feature types used are as follows:
- CoPar - Conjunct parallelism at different depths
- CoParLen - Difference in length in adjacent conjuncts in the same structures
- RightBranch - Prefer right-branching trees
- Heavy - Classifies nodes by category, bin length, end of sentence, and followed by punctuation
- Neighbors - Classifies nodes by category, bin length, and preterminal neighbors
- Rule - Local trees with context information
- NGram - Tuples of adjacent children nodes
- Head - Tuples of head-to-head dependencies
- LexFunHeads - Tuples of parts of speech of the lexical and functional heads
- WProj - Preterminals with categories of projection ancestors
- Word - Lexical items with ancestor nodes
- HeadTree - Local trees projections of preterminal nodes and siblings
- NGramTree - Subtrees with contiguous ancestors
They use a MaxEnt estimator for producing feature weights with a Gaussian or quadratic regularizer.
Experiment and Results
They run the reranker using parse trees obtained from their 50-best parses, as well as trees provided by Collins (presumably using the Collins parser). The results are:
Reranked Collins -score: 0.9037 Reranked 50-best -score: 0.9102
Conclusion
The method described does two things: allows for obtaining -best parse trees efficiently, and efficiently rerank candidate parse trees. Both offer solid improvements.
Related Work
- "Reranking and Self-Training for Parser Adaptation" McClosky, Charniak, Johnson, COLING-ACL 2006 - Applies this method to get improvements using the Brown TreeBank
- "Effective Self-Training for Parsing", McClosky, Charniak, and Johnson, HLT-NAACL 2006 - A similar approach using unsupervised learning and bootstrapping