Difference between revisions of "Headden et al. NAACL 09"

From Cohen Courses
Jump to navigationJump to search
m
m
Line 27: Line 27:
 
== Experimental Result ==
 
== Experimental Result ==
  
 +
The authors trained on the standard [[UsesDataset::Penn Treebank WSJ corpus]].
  
 
== Related Papers ==
 
== Related Papers ==
  
 
[http://www.cs.berkeley.edu/~klein/papers/acl04-factored_induction.pdf Corpus-Based Induction of Syntactic Structure: Models of Dependency and Constituency]. D Klein and C Manning (2004). In ''ACL 2004''
 
[http://www.cs.berkeley.edu/~klein/papers/acl04-factored_induction.pdf Corpus-Based Induction of Syntactic Structure: Models of Dependency and Constituency]. D Klein and C Manning (2004). In ''ACL 2004''

Revision as of 18:42, 29 November 2011

Improving Unsupervised Dependency Parsing with Richer Contexts and Smoothing, by W. P. Headden III, W Headden III, M Johnson, D McClosky. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, 2009.

This Paper is available online [1].

Summary

This paper improves on unsupervised dependency parsing by introducing basic valence frames and lexical information. Smoothing is also performed to leverage on this additional information. Their model produces 10 percentage points improvements over previous work in unsupervised (dependency) grammar induction.

Brief description of the method

The paper builds upon the Dependency Model with Valence by Klein and Manning (2004). The DMV is a generative model in which the head of a sentence is generated and then each head recursively generates its left and right dependents. The arguments of the head in a certain direction are generated repeatedly by deciding whether to generate a new argument or to stop.

The dependency models used in the paper are framed in split-head bilexical CFGs (Eisner and Satta, 1999), which has a fast parsing algorithm to compute the expectations required by Variational Bayes.

Enriching contexts with argument order

DMV models distributions over arguments identically without considering the order they are generated. The model used in the paper, EVG, distinguishes the distribution over the argument nearest to the head from the distribution of the subsequent argument. For instance, consider the phrase "the big dog", we would expect the distribution for the nearest argument "big" to be different from that of a further argument "the". In the figure below, we see that this is captured using different nonterminals referring to nearest/further arguments.

Lexicalization

Lexical information is incorporated into EVG (L-EVG) by extending the EVG CFG to allow nonterminals to be annotated with both the word and the POS tag of the head.

Smoothing

EVG smooths its parameters by linear interpolation. They represent linear interpolation in their PCFG with tied rule probabilities. The smoothing weights of the are accomplished by setting the Dirichlet hyperparameters for their tied PCFG. By setting a larger hyperparameter for the backoff distribution's "rule" would mean that after seeing sufficiently large number of examples, the model will start to ignore it.

Experimental Result

The authors trained on the standard Penn Treebank WSJ corpus.

Related Papers

Corpus-Based Induction of Syntactic Structure: Models of Dependency and Constituency. D Klein and C Manning (2004). In ACL 2004