Difference between revisions of "Headden et al. NAACL 09"

From Cohen Courses
Jump to navigationJump to search
m
m
Line 45: Line 45:
 
[http://www.cs.berkeley.edu/~klein/papers/acl04-factored_induction.pdf Corpus-Based Induction of Syntactic Structure: Models of Dependency and Constituency]. D Klein and C Manning (2004). In ''ACL 2004''
 
[http://www.cs.berkeley.edu/~klein/papers/acl04-factored_induction.pdf Corpus-Based Induction of Syntactic Structure: Models of Dependency and Constituency]. D Klein and C Manning (2004). In ''ACL 2004''
  
[[RelatedPaper::Cohen and Smith, ACL 2009: Shared Logistic Normal Distributions for Soft Parameter Tying in Unsupervised Grammer Induction|Cohen and Smith, ACL 2009]]
+
[http://www.cs.cmu.edu/~scohen/naacl09sln.pdf Shared Logistic Normal Distributions for Soft Parameter Tying in Unsupervised Grammer Induction]. S Cohen and N Smith. In ''ACL 2009''.
  
 
[[Class_Meeting_for_10-710_10-13-2011 | Class meeting for 10-710 on 10-13-2011]] discusses [[dependency parsing]]
 
[[Class_Meeting_for_10-710_10-13-2011 | Class meeting for 10-710 on 10-13-2011]] discusses [[dependency parsing]]

Revision as of 18:04, 29 November 2011

Improving Unsupervised Dependency Parsing with Richer Contexts and Smoothing, by W. P. Headden III, W Headden III, M Johnson, D McClosky. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, 2009.

This Paper is available online [1].

Summary

This paper improves on unsupervised dependency parsing by introducing basic valence frames and lexical information. Smoothing is also performed to leverage on this additional information. Their model produces 10 percentage points improvements over previous work in unsupervised (dependency) grammar induction.

Brief description of the method

The paper builds upon the Dependency Model with Valence by Klein and Manning (2004). The DMV is a generative model in which the head of a sentence is generated and then each head recursively generates its left and right dependents. The arguments of the head in a certain direction are generated repeatedly by deciding whether to generate a new argument or to stop.

The dependency models used in the paper are framed in split-head bilexical CFGs (Eisner and Satta, 1999), which has a fast parsing algorithm to compute the expectations required by Variational Bayes.

Enriching contexts with argument order

DMV models distributions over arguments identically without considering the order they are generated. The model used in the paper, EVG, distinguishes the distribution over the argument nearest to the head from the distribution of the subsequent argument. For instance, consider the phrase "the big dog", we would expect the distribution for the nearest argument "big" to be different from that of a further argument "the". In the figure below, we see that this is captured using different nonterminals referring to nearest/further arguments.

Evg tree.png

Lexicalization

Lexical information is incorporated into EVG (L-EVG) by extending the EVG CFG to allow nonterminals to be annotated with both the word and the POS tag of the head.

Smoothing

EVG smooths its parameters by linear interpolation. They represent linear interpolation in their PCFG with tied rule probabilities. The smoothing weights of the are accomplished by setting the Dirichlet hyperparameters for their tied PCFG. By setting a larger hyperparameter for the backoff distribution's "rule" would imply that after seeing sufficiently large number of examples, the model will start to ignore it.

The author's method of combines linear interpolation with a Bayesian prior results in an augmented PCFG which is essentially still a PCFG, making it amenable to standard estimation techniques.

Experimental Result

The authors trained on the standard Penn Treebank WSJ corpus. Evaluating against the gold standard dependencies in section 23, they obtained the following results.

Using just smoothing, there was a large improvement over the baseline DMV model.

Also, the model was able to learn the most likely argument types for different valence and directions.

Evg results.png

Likely args.png

Related Papers

Corpus-Based Induction of Syntactic Structure: Models of Dependency and Constituency. D Klein and C Manning (2004). In ACL 2004

Shared Logistic Normal Distributions for Soft Parameter Tying in Unsupervised Grammer Induction. S Cohen and N Smith. In ACL 2009.

Class meeting for 10-710 on 10-13-2011 discusses dependency parsing