Eisner algorithm

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

This method is a widely-used dynamic-programming algorithm, and the basis for many papers, that addresses the problem of Dependency Parsing. It was introduced in Eisner COLING 1996 Three New Probabilistic Models for Dependency Parsing: An Exploration and also described in Eisner 2000 Bilexical Grammars and Their Cubic-Time Parsing Algorithms.

Background

For several tasks in the domain of Natural Language Processing, dependency grammars have proven to be very useful, specially because of the development of efficient parsing algorithms. Dependency grammars differ from phrase structure grammars in that they lack phrasal nodes: dependency structures are one-to-one relations in the sense that a word depends only on another word (the head), as shown in the next figure.

Most parsing algorithms are based on the idea that searching for the highest-scoring analysis tree of a sentence is more feasible by factoring those dependency trees into sets of parts that have limited interactions. By exploiting the constraints derived from the factorizations, summations and maximizations over the set of possible trees can be performed efficiently.

Definition

The Eisner algorithm is based on a "first-order" factorization, which decomposes a dependency tree into its individual dependencies (the order of a part is defined as the number of the dependencies it contains). Two dynamic-programming structures are used: complete spans which consist of a head-word and its descendants on one side and incomplete spans which consist of a dependency and the region between the head and modifier.

• Complete span: $\,C_{h,e}$ where $\,h$ and $\,e$ are the indices of the head-word and the endpoint of the span.
• Incomplete span: $\,I_{h,m}$ where $\,h$ and $\,m$ are the index of the head and modifier of a dependency.

In other words, a complete span represents a "half-constituent" with $\,h$ as the head, and and incomplete span represents a partial half-constituent, since it can be extended by adding more modifiers to $\,m$ . Each kind of span is created by combining, recursively, two smaller spans that are adjacent, as shown in the next figure following the arrows upward. The parsing algorithm builds successively larger spans in a dynamic programming table or chart. The minimal spans used to seed the chart are linked or unlinked word bigrams, such as: $\,[The\rightarrow plan]$ or $\,[tax\,ROOT]$ in the example.

An incomplete span is created from a pair of complete spans, indicating the division of the range $\,[h,m]$ into constituents headed by $\,h$ and $\,m$ . A complete span is constructed by completing an incomplete span with the other half of the constituent of $\,m$ . The next figure shows the constructions and their points of concatenation, $\,m$ in (a) and $\,r$ in (b), which are also called split points, free indices that must be enumerated to find the optimal construction. Complete spans are depicted as triangles and incomplete spans as trapezoids.

Pseudocode

Therefore, in order to parse a sentence, the algorithm needs to find optimal constructions for all the complete and incomplete spans defined on the sentence. The Eisner algorithm finds them by adapting standard chart-parsing techniques to the recursive derivations. The following pseudocode describes the adapted algorithm, which runs in cubic time and requires quadratic space because each derivation is defined by two fixed indices (the span boundaries) and a third free index (the split point).