Difference between revisions of "Dreyer and Eisner, EMNLP 2006"

From Cohen Courses
Jump to navigationJump to search
Line 13: Line 13:
  
 
==Improvements in Paper==
 
==Improvements in Paper==
The main idea in the paper was to reduce the number of weights that needed to be learned. This is unequivocally a good thing since it means that either the model is smaller or you can train more weights for the same price. The authors  
+
The main idea in the paper was to reduce the number of weights that needed to be learned. This is unequivocally a good thing since it means that either the model is smaller or you can train more weights for the same price. The authors decided that there were too many transfer probabilities. Instead of calculating the weights for any value of <math>\alpha</math> to any values of <math>\beta</math> and <math>\gamma</math>, the probabilities they calculated were <math>P(pass to head | rule), P(pass to both | rule), P(P_{ann}(feature_value | nonterminal)</math>. <math>P_{ann}</math> is the probability of annotating the nonterminal with some feature value given that it did not inherit the feature value from its parent. This greatly reduces the number of probabilities to calculate since we're grouping a bunch of the <math>\alpha, \beta, \gamma</math> weights all together.
  
 
==Experimental Results==
 
==Experimental Results==

Revision as of 17:33, 27 November 2011

Better Informed Training of Latent Syntactic Features

This paper can be found at: [1]

Citation

Markus Dreyer and Jason Eisner. Better informed training of latent syntactic features. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 317–326, Sydney, Australia, 2006. DOI: 10.3115/1610075.1610120

Summary

They wanted to improve parsing accuracy by introducing hidden features. It would seem that linguistic features (such as number, gender, etc) would help in constraining the parse trees. They wanted to include these features, but the treebank did not include the features. So, the authors used a modified EM algorithm with simulated annealing to find the features and then construct rules that took advantage of the features. The main contribution is an attempt to improve upon the work of Matsuzaki et al by reducing the number of degrees of freedom to learn so that the syntactic features can take on a greater range of values. The new method allows less freedom in learning transfer probabilities. The end result was no improvement over the previous work.

Previous work

As they say in the paper, "treebanks never contain enough information". Lots of parsing work had been done on splitting the nonterminals to be able to train the important nonterminal better. But the splitting was mostly done in an ad-hoc fashion until Matsuzaki et al (2005) with PCFG-LA (Probabilistic context-free grammar with latent annotations). They wanted to incorporate features that only propagated in "linguistically-motived ways". By only propagating it in linguistically-motivated ways, the authors avoided having to learn way too many weights. The system works the same as a PCFG, but there are many more rules. If we have a regular rule in the PCFG , in PCFG-LA, it would be represented as where and are the features values (an integer).

Improvements in Paper

The main idea in the paper was to reduce the number of weights that needed to be learned. This is unequivocally a good thing since it means that either the model is smaller or you can train more weights for the same price. The authors decided that there were too many transfer probabilities. Instead of calculating the weights for any value of to any values of and , the probabilities they calculated were . is the probability of annotating the nonterminal with some feature value given that it did not inherit the feature value from its parent. This greatly reduces the number of probabilities to calculate since we're grouping a bunch of the weights all together.

Experimental Results

Matsuzaki et al, ACL 2005