Difference between revisions of "Koo and Collins ACL 2010"
(33 intermediate revisions by the same user not shown) | |||
Line 9: | Line 9: | ||
== Summary == | == Summary == | ||
− | This [[Category::paper]] presents a higher-order [[AddressesProblem::Dependency Parsing|dependency parser]] that can evaluate substructures containing three dependencies, using both sibling-style and grandchild-style interactions. The algorithms presented require only <math>O(n^4)</math> time and were evaluated on the [[UsesDataset::Penn Treebank]] and the [[UsesDataset::Prague Dependency Treebank]]. The implementation code was publicly released [http://groups.csail.mit.edu/nlp/dpo3/]. | + | This [[Category::paper]] presents a higher-order [[AddressesProblem::Dependency Parsing|dependency parser]] that can evaluate substructures containing three dependencies, using both sibling-style and grandchild-style interactions. The algorithms presented require only <math> O(n^4)</math> time and were evaluated on the [[UsesDataset::Penn Treebank]] and the [[UsesDataset::Prague Dependency Treebank]]. The implementation code was publicly released [http://groups.csail.mit.edu/nlp/dpo3/]. |
+ | |||
+ | === Background === | ||
+ | |||
+ | Dependency parsing is defined as a search for the highest-scoring analysis of a sentence <math>\, x</math>: | ||
+ | |||
+ | <math>y^*(x) = _{y\in Y(x)}^{argmax}\textrm{Score}(x,y)</math> | ||
+ | |||
+ | where <math>\, Y(x) </math> is the set of all trees compatible with <math>\, x </math> and <math>\, \textrm{Score}(x,y) </math> evaluates the event that tree <math>\, y </math> is the analysis of <math>\, x </math>. Directly solving the equation is unfeasible because the number of possible trees grow exponentially with the length of the sentence. A common strategy is to factor each dependency tree into small parts which can be scored individually, then: | ||
+ | |||
+ | <math>\textrm{Score}(x,y) = \sum_{p \in y}\textrm{ScorePart}(x,p)</math> | ||
+ | |||
+ | The ''order'' of a part is defined according to the number of dependencies it contains, a terminology used in previous parsing algorithms. The factorizations in this new third-order parser uses the following types of parts: | ||
+ | |||
+ | [[File:dpo3-01.png]] | ||
+ | |||
+ | === Description of new method === | ||
+ | |||
+ | The overall method augments each ''span'' with a ''grandparent'' index. The concept of spans comes from the structures in the [[UsesMethod::Eisner algorithm]] (a first-order parser), where ''complete'' spans consist of a head-word and its descendants on one side, and ''incomplete'' spans consist of a dependency and the region between the head and modifier. Based on this, the paper introduces three parsing algorithms based on three different models. | ||
+ | |||
+ | ==== Model 0: all grandchildren ==== | ||
+ | |||
+ | This model factors each dependency tree into a set of grandchild parts (pairs of dependencies connected head to tail). Formally, a triple of indices <math>\, (g, h, m)</math> where <math>\, (g, h)</math> and <math>\, (h, m)</math> are dependencies. Both complete and incomplete spans are augmented with grand-parent indices to produced ''g-spans''. The next figure describes complete and incomplete g-spans and provides a graphical specification of the dynamic-programming algorithm: | ||
+ | |||
+ | [[File:dpo3-02.png]] | ||
+ | |||
+ | Model 0 parsing algorithm is similar to Eisner algorithm, but every recursive construction sets the grandparent indices of the smaller g-spans. Model 0 can be parsed by adapting standard top-down or [[UsesMethod::CYK Parsing|bottom-up chart parsing techniques]]. The following pseudocode sketches a bottom-up chart parser for this model: | ||
+ | |||
+ | [[File:dpo3-03.png]] | ||
+ | |||
+ | The complexity of the algorithm is <math>O(n^4)</math> time and <math>O(n^3)</math> space because each derivation is defined by three fixed indices (the g-span boundaries) and a third free index (the split point). This is still a second-order parsing algorithm but the next models (1 and 2) are third-order parsers. | ||
+ | |||
+ | ==== Model 1: all grand-siblings ==== | ||
+ | |||
+ | This model decomposes each tree into a set of grand-sibling parts: 4-tuples of indices <math>\, (g, h, m, s)</math> where <math>\, (h, m, s)</math> is a sibling part and <math>\, (g, h, m)</math> and <math>\, (g, h, s)</math> are grandchild parts. These factorizations are parsed using sibling g-spans, as graphically described in the next figure: | ||
+ | |||
+ | [[File:dpo3-04.png]] | ||
+ | |||
+ | The parsing algorithm for Model 1 is similar to the second-order sibling parser with the addition of grandparent indices. Like Model 0, Model 1 can also be parsed adapting standard chart-parsing techniques and despite using third-order parts, it also requires only <math>O(n^4)</math> time and <math>O(n^3)</math> space. | ||
+ | |||
+ | ==== Model 2: grand-siblings and tri-siblings ==== | ||
+ | |||
+ | This model captures both grand-sibling parts and tri-sibling parts: 4-tuples of indices <math>\, (h, m, s, t)</math> where both <math>\, (h, m, s)</math> and <math>\, (h, s, t)</math> are sibling parts. These factorizations are parsed using a new-type of structure called s-spans (sibling-augmented spans), formally defined as <math>\, I_{h,m,s}</math> where <math>\, I_{h,m}</math> is a normal incomplete span and <math>\, s</math> is an index lying in the strict interior of the range <math>\, [h,m]</math> where <math>\, (h, m, s)</math> forms a valid sibling part. The next figure describes Model 2 parsing algorithm, which also requires only <math>O(n^4)</math> time and <math>O(n^3)</math> space: | ||
+ | |||
+ | [[File:dpo3-05.png]] | ||
== Experimental results == | == Experimental results == | ||
− | + | The accuracy of Models 1 and 2 were evaluated on the English and Czech test sets, and compared to relevant results from related work. Some models (marked with †) are not directly comparable because they depend on additional resources that were not used for training. | |
+ | |||
+ | [[File:dpo3_09.png]] | ||
== Related papers == | == Related papers == | ||
− | + | The third-order dependency parsers build on ideas from existing parsing algorithms, specifically, the first-order Eisner algorithm introduced in [[RelatedPaper::Eisner COLING 1996]] and the second-order sibling parser described in [[RelatedPaper::McDonald and Pereira EACL 2006]]. | |
+ | |||
+ | Dependency parsing has been an active research subject and several techniques have been proposed, such as [[RelatedPaper::McDonald et al, ACL 2005: Non-Projective Dependency Parsing Using Spanning Tree Algorithms]], [[RelatedPaper::Smith and Eisner 2008:Dependency parsing by belief propagation]] and [[RelatedPaper::Daume ICML 2009]]. |
Latest revision as of 23:53, 25 November 2011
Contents
Citation
Koo, T. and Collins, M. 2010. Efficient Third-Order Dependency Parsers. In Proceedings of ACL, pp. 1-11. Association for Computational Linguistics.
Online version
Summary
This paper presents a higher-order dependency parser that can evaluate substructures containing three dependencies, using both sibling-style and grandchild-style interactions. The algorithms presented require only time and were evaluated on the Penn Treebank and the Prague Dependency Treebank. The implementation code was publicly released [1].
Background
Dependency parsing is defined as a search for the highest-scoring analysis of a sentence :
where is the set of all trees compatible with and evaluates the event that tree is the analysis of . Directly solving the equation is unfeasible because the number of possible trees grow exponentially with the length of the sentence. A common strategy is to factor each dependency tree into small parts which can be scored individually, then:
The order of a part is defined according to the number of dependencies it contains, a terminology used in previous parsing algorithms. The factorizations in this new third-order parser uses the following types of parts:
Description of new method
The overall method augments each span with a grandparent index. The concept of spans comes from the structures in the Eisner algorithm (a first-order parser), where complete spans consist of a head-word and its descendants on one side, and incomplete spans consist of a dependency and the region between the head and modifier. Based on this, the paper introduces three parsing algorithms based on three different models.
Model 0: all grandchildren
This model factors each dependency tree into a set of grandchild parts (pairs of dependencies connected head to tail). Formally, a triple of indices where and are dependencies. Both complete and incomplete spans are augmented with grand-parent indices to produced g-spans. The next figure describes complete and incomplete g-spans and provides a graphical specification of the dynamic-programming algorithm:
Model 0 parsing algorithm is similar to Eisner algorithm, but every recursive construction sets the grandparent indices of the smaller g-spans. Model 0 can be parsed by adapting standard top-down or bottom-up chart parsing techniques. The following pseudocode sketches a bottom-up chart parser for this model:
The complexity of the algorithm is time and space because each derivation is defined by three fixed indices (the g-span boundaries) and a third free index (the split point). This is still a second-order parsing algorithm but the next models (1 and 2) are third-order parsers.
Model 1: all grand-siblings
This model decomposes each tree into a set of grand-sibling parts: 4-tuples of indices where is a sibling part and and are grandchild parts. These factorizations are parsed using sibling g-spans, as graphically described in the next figure:
The parsing algorithm for Model 1 is similar to the second-order sibling parser with the addition of grandparent indices. Like Model 0, Model 1 can also be parsed adapting standard chart-parsing techniques and despite using third-order parts, it also requires only time and space.
Model 2: grand-siblings and tri-siblings
This model captures both grand-sibling parts and tri-sibling parts: 4-tuples of indices where both and are sibling parts. These factorizations are parsed using a new-type of structure called s-spans (sibling-augmented spans), formally defined as where is a normal incomplete span and is an index lying in the strict interior of the range where forms a valid sibling part. The next figure describes Model 2 parsing algorithm, which also requires only time and space:
Experimental results
The accuracy of Models 1 and 2 were evaluated on the English and Czech test sets, and compared to relevant results from related work. Some models (marked with †) are not directly comparable because they depend on additional resources that were not used for training.
Related papers
The third-order dependency parsers build on ideas from existing parsing algorithms, specifically, the first-order Eisner algorithm introduced in Eisner COLING 1996 and the second-order sibling parser described in McDonald and Pereira EACL 2006.
Dependency parsing has been an active research subject and several techniques have been proposed, such as McDonald et al, ACL 2005: Non-Projective Dependency Parsing Using Spanning Tree Algorithms, Smith and Eisner 2008:Dependency parsing by belief propagation and Daume ICML 2009.