Eisner algorithm

From Cohen Courses
Revision as of 02:27, 26 November 2011 by Aanavas (talk | contribs) (→‎Definition)
Jump to navigationJump to search

This method is a widely-used dynamic-programming algorithm, and the basis for many papers, that addresses the problem of Dependency Parsing. It was introduced in Eisner COLING 1996 Three New Probabilistic Models for Dependency Parsing: An Exploration and also described in Eisner 2000 Bilexical Grammars and Their Cubic-Time Parsing Algorithms.

Definition

For several tasks in the domain of Natural Language Processing, dependency grammars have proven to be very useful, specially because of the development of efficient parsing algorithms. Most of them are based on the idea that searching for the highest-scoring analysis tree of a sentence is more feasible by factoring those dependency trees into sets of parts that have limited interactions. By exploiting the contraints derived from the factorizations, summations and maximizations over the set of possible trees can be performed efficiently.

The Eisner algorithm is based on a "first-order" factorization, which decomposes a dependency tree into its individual dependencies (the order of a part is defined as the number of the dependencies it contains). Two dynamic-programming structures are used: complete spans which consist of a head-word and its descendants on one side and incomplete spans which consist of a dependency and the region between the head and modifier.

  • Complete span: where and are the indices of the head-word and the endpoint of the span.
  • Incomplete span: where and are the index of the head and modifier of a dependency.

In other words, a complete span represents a "half-constituent" with as the head, and and incomplete span represents a partial half-constituent, since it can be extended by adding more modifiers to .

Each kind of span is created by combining, recursively, two smaller spans that are adjacent.

Relevant Papers