Difference between revisions of "Koo and Collins ACL 2010"

From Cohen Courses
Jump to navigationJump to search
Line 14: Line 14:
  
 
Dependency parsing is defined as a search for the highest-scoring analysis of a sentence <math>\, x</math>:
 
Dependency parsing is defined as a search for the highest-scoring analysis of a sentence <math>\, x</math>:
 
  
 
<math>y^*(x) = _{y\in Y(x)}^{argmax}\textrm{Score}(x,y)</math>
 
<math>y^*(x) = _{y\in Y(x)}^{argmax}\textrm{Score}(x,y)</math>
 
  
 
where <math>\, Y(x) </math> is the set of all trees compatible with <math>\, x </math> and <math>\, \textrm{Score}(x,y) </math> evaluates the event that tree <math>\, y </math> is the analysis of <math>\, x </math>. Directly solving the equation is unfeasible because the number of possible trees grow exponentially with the length of the sentence. A common strategy is to factor each dependency tree into small parts which can be scored individually, then:
 
where <math>\, Y(x) </math> is the set of all trees compatible with <math>\, x </math> and <math>\, \textrm{Score}(x,y) </math> evaluates the event that tree <math>\, y </math> is the analysis of <math>\, x </math>. Directly solving the equation is unfeasible because the number of possible trees grow exponentially with the length of the sentence. A common strategy is to factor each dependency tree into small parts which can be scored individually, then:
 
  
 
<math>\textrm{Score}(x,y) = \sum_{p \in y}\textrm{ScorePart}(x,p)</math>
 
<math>\textrm{Score}(x,y) = \sum_{p \in y}\textrm{ScorePart}(x,p)</math>
 
  
 
The ''order'' of a part is defined according to the number of dependencies it contains, a terminology used in previous parsing algorithms, such as first-order factorization and second-order sibling factorization. The factorizations in this new third-order parser uses the following parts:
 
The ''order'' of a part is defined according to the number of dependencies it contains, a terminology used in previous parsing algorithms, such as first-order factorization and second-order sibling factorization. The factorizations in this new third-order parser uses the following parts:
  
 
[[File:dpo3-01.png]]
 
[[File:dpo3-01.png]]
 +
 +
=== Description of new method ===
 +
 +
The overall method augments each ''span'' with a ''grandparent'' index. The concept of spans comes from the Eisner algorithm structures, where ''complete'' spans consist of a head-word and its descendants on one side, and ''incomplete'' spans consist of a dependency and the region between the head and modifier. Based on this, the paper introduces three parsing algorithms based on three different models.
 +
 +
==== Model 0: all grandchildren ===
 +
 +
This model factors each dependency tree into a set of grandchild parts (pairs of dependencies connected head to tail). Formally, a triple of indices <math>\, (g, h, m)</math> where <math>\, (g, h)</math> and <math>\, (h, m)</math> are dependencies. Both complete and incomplete spans are augmented with grand-parent indices to produced ''g-spans''. The next figure describes complete and incomplete g-spans and provides a graphical specification of the dynamic-programming algorithm:
 +
 +
[[File:dpo3-02.png]]
 +
 +
The algorithm is similar to the first-order parser, but every recursive construction sets the grandparent indices of the smaller g-spans. Model 0 can be parsed by adapting standard top-down or bottom-up chart parsers. The following pseudocode sketches a bottom-up chart parser for this model:
 +
 +
[[File:dpo3-02.png]]
 +
 +
The complexity of the algorithm is
  
 
== Experimental results ==
 
== Experimental results ==

Revision as of 21:52, 25 November 2011

Citation

Koo, T. and Collins, M. 2010. Efficient Third-Order Dependency Parsers. In Proceedings of ACL, pp. 1-11. Association for Computational Linguistics.

Online version

ACL

Summary

This paper presents a higher-order dependency parser that can evaluate substructures containing three dependencies, using both sibling-style and grandchild-style interactions. The algorithms presented require only time and were evaluated on the Penn Treebank and the Prague Dependency Treebank. The implementation code was publicly released [1].

Background

Dependency parsing is defined as a search for the highest-scoring analysis of a sentence :

where is the set of all trees compatible with and evaluates the event that tree is the analysis of . Directly solving the equation is unfeasible because the number of possible trees grow exponentially with the length of the sentence. A common strategy is to factor each dependency tree into small parts which can be scored individually, then:

The order of a part is defined according to the number of dependencies it contains, a terminology used in previous parsing algorithms, such as first-order factorization and second-order sibling factorization. The factorizations in this new third-order parser uses the following parts:

Dpo3-01.png

Description of new method

The overall method augments each span with a grandparent index. The concept of spans comes from the Eisner algorithm structures, where complete spans consist of a head-word and its descendants on one side, and incomplete spans consist of a dependency and the region between the head and modifier. Based on this, the paper introduces three parsing algorithms based on three different models.

= Model 0: all grandchildren

This model factors each dependency tree into a set of grandchild parts (pairs of dependencies connected head to tail). Formally, a triple of indices where and are dependencies. Both complete and incomplete spans are augmented with grand-parent indices to produced g-spans. The next figure describes complete and incomplete g-spans and provides a graphical specification of the dynamic-programming algorithm:

Dpo3-02.png

The algorithm is similar to the first-order parser, but every recursive construction sets the grandparent indices of the smaller g-spans. Model 0 can be parsed by adapting standard top-down or bottom-up chart parsers. The following pseudocode sketches a bottom-up chart parser for this model:

Dpo3-02.png

The complexity of the algorithm is

Experimental results

Bla bla.

Related papers

Bla bla.