Difference between revisions of "McDonald et al, ACL 2005: Non-Projective Dependency Parsing Using Spanning Tree Algorithms"

Revision as of 17:56, 22 October 2011

Citation

R. McDonald, F. Pereira, K. Ribarov, J. Hajič. Non-projective dependency parsing using spanning tree algorithms, Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, pp. 523-530, Vancouver, October 2005.

Online Version

PDF version

Summary

This paper addresses the problem of non-projective dependency prasing.

Given a sentence $\mathbf {x} =(x_{1},\ldots ,x_{n})$ , we can construct a directed graph $\mathbf {G_{x}} =(V,E)$ . The vertex set $V$ contains one vertex $v_{i}$ for each word $x_{i}$ in the sentence, plus a dummy vertex $v_{0}$ for the "root". The edge set $E$ contains all directed edges of the form $(v_{i},v_{j})$ , where $0\leq i\leq n$ , $1\leq j\leq n$ , and $i\neq j$ . Each edge has a score of the form $s(i,j)=\mathbf {w} \cdot \mathbf {f} (i,j)$ , where $\mathbf {f} (i,j)$ is a feature vector depending on the words $x_{i}$ and $x_{j}$ , and $\mathbf {w}$ is a weight factor.

A dependency parse tree $\mathbf {y}$ is a subgraph of $\mathbf {G_{x}}$ which covers all the vertices in $V$ , and in which each vertex has exactly one predecessor (except for the root vertex which has no predecessor). A projective dependency parse tree has the additional constraint that each of its subtrees covers a contiguous region of the sentence. In either case, the score of a dependency tree $\mathbf {y}$ is factored as the sum of the scores of its edges:

$s(\mathbf {x} ,\mathbf {y} )=\sum _{(i,j)\in \mathbf {y} }\mathbf {w} \cdot \mathbf {f} (i,j)$

The (projective) decoding problem is to find the max-scoring (projective) dependency parse tree $\mathbf {y}$ given a sentence $\mathbf {x}$ , assuming that the weight vector $\mathbf {w}$ is known. The learning problem is to find an optimal weight vector $\mathbf {w}$ , such that the sum of the scores of the dependency parse trees in a training corpus is maximized.

The Decoding Problem

The non-projective decoding problem is equivalent to finding the maximum spanning tree in a directed graph (also called the maximum arborescence). This can be solved using the Chu-Liu-Edmonds algorithm (http://en.wikipedia.org/wiki/Edmonds'_algorithm Wikipedia]).

A naive implementation of the Chu-Liu-Edmonds algorithm has a time complexity of $O(|V|^{3})$ . In 1977, Robert Tarjan implemented the algorithm with $O(|E|log|V|)$ complexity for sparse graphs and $O(|V|^{2})$ complexity for dense graphs, the latter of which is used by this paper. In 1986, Gabow, Galil, Spencer, and Tarjan made an even faster implementation with a complexity of $O(|E|+|V|log|V|)$ .

A major advantage of the maximum spanning tree solution over previous solutions is its uniformity and simplicity. Previous algorithms for non-projective dependency parsing were modifications to the Eisner algorithm (a dynamic programming algorithm of complexity $O(|V|^{3})$ ), and often involve approximation. In contrast, the maximum spanning tree solution searches the entire space of dependency parse trees, and it reveals the fact that non-projective dependency parsing is actually easier than projective dependency parsing.

The Learning Problem

An online large-margin learning algorithm, called MIRA, is used to train the weight vector $\mathbf {w}$ . The algorithm passes through the training corpus multiple times, and for each training example $\mathbf {x} _{t},\mathbf {y} _{t}$ , it updates the weight vector so that the scores of $\mathbf {y} _{t}$ and any other parse tree $\mathbf {y} '$ are separated at least by a loss function $L(\mathbf {y} _{t},\mathbf {y} ')$ . The loss function is defined as the number of vertices that have different parents in the two trees. The weight vectors after each update are averaged to yield the final weight vector.

@@ Line 22: / Line 22: @@
 The non-projective decoding problem is equivalent to finding the '''maximum spanning tree''' in a directed graph (also called the '''maximum arborescence'''). This can be solved using the [[UsesMethod::Chu-Liu-Edmonds algorithm]] (http://en.wikipedia.org/wiki/Edmonds'_algorithm Wikipedia]).
-[[File:Chu-Liu-Edmonds algorithm.png]]
 A naive implementation of the Chu-Liu-Edmonds algorithm has a time complexity of <math>O(|V|^3)</math>. In 1977, Robert Tarjan implemented the algorithm with <math>O(|E|log|V|)</math> complexity for sparse graphs and <math>O(|V|^2)</math> complexity for dense graphs, the latter of which is used by this paper. In 1986, Gabow, Galil, Spencer, and Tarjan made an even faster implementation with a complexity of <math>O(|E| + |V|log|V|)</math>.
-A major advantage of the maximum spanning tree solution over previous solutions is its uniformity and simplicity. Previous algorithms for non-projective dependency parsing were modifications to the [[Eisner algorithm]] (a dynamic programming algorithm of complexity <math>O(|V|^3)</math>), and often involve approximation. In contrast, the maximum spanning tree solution searches the entire space of dependency parse trees, and it reveals the fact that non-projective dependency parsing is actually easier than projective dependency parsing.
+A major advantage of the maximum spanning tree solution over previous solutions is its uniformity and simplicity. Previous algorithms for non-projective dependency parsing were modifications to the [[UsesMethod::Eisner algorithm]] (a dynamic programming algorithm of complexity <math>O(|V|^3)</math>), and often involve approximation. In contrast, the maximum spanning tree solution searches the entire space of dependency parse trees, and it reveals the fact that non-projective dependency parsing is actually easier than projective dependency parsing.
 === The Learning Problem ===
+An online large-margin learning algorithm, called [[UsesMethod::MIRA]], is used to train the weight vector <math>\mathbf{w}</math>. The algorithm passes through the training corpus multiple times, and for each training example <math>\mathbf{x}_t, \mathbf{y}_t</math>, it updates the weight vector so that the scores of <math>\mathbf{y}_t</math> and any other parse tree <math>\mathbf{y}'</math> are separated at least by a loss function <math>L(\mathbf{y}_t, \mathbf{y}')</math>. The loss function is defined as the number of vertices that have different parents in the two trees. The weight vectors after each update are averaged to yield the final weight vector.
+[[File:MIRA.png]]
 == Experiments ==

Difference between revisions of "McDonald et al, ACL 2005: Non-Projective Dependency Parsing Using Spanning Tree Algorithms"

Revision as of 17:56, 22 October 2011

Contents

Citation

Online Version

Summary

The Decoding Problem

The Learning Problem

Experiments

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools