Difference between revisions of "Koo and Collins ACL 2010"
Line 9: | Line 9: | ||
== Summary == | == Summary == | ||
− | This [[Category::paper]] presents a higher-order [[AddressesProblem::Dependency Parsing|dependency parser]] that can evaluate substructures containing three dependencies, using both sibling-style and grandchild-style interactions. The algorithms presented require only <math> | + | This [[Category::paper]] presents a higher-order [[AddressesProblem::Dependency Parsing|dependency parser]] that can evaluate substructures containing three dependencies, using both sibling-style and grandchild-style interactions. The algorithms presented require only <math> O(n^4)</math> time and were evaluated on the [[UsesDataset::Penn Treebank]] and the [[UsesDataset::Prague Dependency Treebank]]. The implementation code was publicly released [http://groups.csail.mit.edu/nlp/dpo3/]. |
=== Background === | === Background === | ||
Line 39: | Line 39: | ||
[[File:dpo3-02.png]] | [[File:dpo3-02.png]] | ||
− | The complexity of the algorithm is <math> | + | The complexity of the algorithm is <math>O(n^4)</math> time and <math>O(n^3)</math> space. |
== Experimental results == | == Experimental results == |
Revision as of 21:53, 25 November 2011
Contents
Citation
Koo, T. and Collins, M. 2010. Efficient Third-Order Dependency Parsers. In Proceedings of ACL, pp. 1-11. Association for Computational Linguistics.
Online version
Summary
This paper presents a higher-order dependency parser that can evaluate substructures containing three dependencies, using both sibling-style and grandchild-style interactions. The algorithms presented require only time and were evaluated on the Penn Treebank and the Prague Dependency Treebank. The implementation code was publicly released [1].
Background
Dependency parsing is defined as a search for the highest-scoring analysis of a sentence :
where is the set of all trees compatible with and evaluates the event that tree is the analysis of . Directly solving the equation is unfeasible because the number of possible trees grow exponentially with the length of the sentence. A common strategy is to factor each dependency tree into small parts which can be scored individually, then:
The order of a part is defined according to the number of dependencies it contains, a terminology used in previous parsing algorithms, such as first-order factorization and second-order sibling factorization. The factorizations in this new third-order parser uses the following parts:
Description of new method
The overall method augments each span with a grandparent index. The concept of spans comes from the Eisner algorithm structures, where complete spans consist of a head-word and its descendants on one side, and incomplete spans consist of a dependency and the region between the head and modifier. Based on this, the paper introduces three parsing algorithms based on three different models.
Model 0: all grandchildren
This model factors each dependency tree into a set of grandchild parts (pairs of dependencies connected head to tail). Formally, a triple of indices where and are dependencies. Both complete and incomplete spans are augmented with grand-parent indices to produced g-spans. The next figure describes complete and incomplete g-spans and provides a graphical specification of the dynamic-programming algorithm:
The algorithm is similar to the first-order parser, but every recursive construction sets the grandparent indices of the smaller g-spans. Model 0 can be parsed by adapting standard top-down or bottom-up chart parsers. The following pseudocode sketches a bottom-up chart parser for this model:
The complexity of the algorithm is time and space.
Experimental results
Bla bla.
Related papers
Bla bla.