Difference between revisions of "Klein et al, ACL 2002"

From Cohen Courses
Jump to navigationJump to search
 
Line 8: Line 8:
 
== Summary ==
 
== Summary ==
 
In this [[Category::paper]] authors present a generative distributed model for the unsupervised induction of natural language syntax which explicitly models constituents yields and context. Parameter search with EM produces higher quality analysis than previously proposed unsupervised systems.
 
In this [[Category::paper]] authors present a generative distributed model for the unsupervised induction of natural language syntax which explicitly models constituents yields and context. Parameter search with EM produces higher quality analysis than previously proposed unsupervised systems.
 +
 +
In this paper authors improved on their previous work which presented conditional model over trees which gave the best published results for unsupervised parsing of the ATIS corpus(Klein and Manning, 2001b). That work suffered from several drawbacks, primarily stemming from the conditional model used for induction. In this work several improvements is proposed. First, they constructed a generative model which utilizes the same features. Then they extended the model to allow multiple constituent types and multiple prior distribution over trees. The new model gives a 13% reduction in parsing error on WSJ sentence experiments, including a positive qualitative shift in error types. Additionally, it produces much more stable results, does not require heavy smoothing, and exhibits a reliable correspondence between the maximized objective and parsing accuracy. It is also much faster, not requiring a fitting phase for each iteration

Latest revision as of 23:11, 1 November 2011

Citation

Dan Klein and Christopher D. Manning. 2002. A generative constituent-context model for improved grammar induction. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (ACL '02). Association for Computational Linguistics, Stroudsburg, PA, USA, 128-135.

Online version

http://acl.ldc.upenn.edu/P/P02/P02-1017.pdf

Summary

In this paper authors present a generative distributed model for the unsupervised induction of natural language syntax which explicitly models constituents yields and context. Parameter search with EM produces higher quality analysis than previously proposed unsupervised systems.

In this paper authors improved on their previous work which presented conditional model over trees which gave the best published results for unsupervised parsing of the ATIS corpus(Klein and Manning, 2001b). That work suffered from several drawbacks, primarily stemming from the conditional model used for induction. In this work several improvements is proposed. First, they constructed a generative model which utilizes the same features. Then they extended the model to allow multiple constituent types and multiple prior distribution over trees. The new model gives a 13% reduction in parsing error on WSJ sentence experiments, including a positive qualitative shift in error types. Additionally, it produces much more stable results, does not require heavy smoothing, and exhibits a reliable correspondence between the maximized objective and parsing accuracy. It is also much faster, not requiring a fitting phase for each iteration