Difference between revisions of "Bayesian grammar induction for language modeling, Chen 1995"

From Cohen Courses
Jump to navigationJump to search
Line 8: Line 8:
 
== Summary ==
 
== Summary ==
  
The [[Category::Paper]] addresses the problem of unsupervised learning of probabilistic context free grammar ([[AddressesProblem::Parsing]]).
+
The [[Category::Paper]] addresses the problem of [[AddressesProblem::Parsing]] using unsupervised learning of probabilistic context free grammar. It views the problem of grammar induction as a [[UsesMethod::Searching | search]] problem, with the search space spanning over all the possible grammars. The objective function that is optimized is the a-posterior probability <math> \arg \max_G p(G | O) = \arg \max_G p(O|G) p(G)</math> where <math>G</math> represents the grammar and <math>O</math> represents the training examples. The algorithm prefers smaller grammars by using a universal a-priori probability over the grammar <math>G</math>, <math>p(G) = 2^{-l(G)}</math> where <math>l(G)</math> is the length of the description of the grammar in bits.
 +
 
 +
== Method ==
 +
 
 +
The algorithm starts with a grammar powerful enough to generate any string to ensure that the generation spans the training data. It then takes a [[UsesMethod::Hill Climbing]] approach by adding a rule that increases the value of the objective function. This is done until there doesn't exist any rule that 
  
  
  
 
Under construction by [[User:Dkulkarn]]
 
Under construction by [[User:Dkulkarn]]

Revision as of 22:39, 2 November 2011

Citation

Bayesian grammar induction for language modeling. By Stanley F. Chen, . In Proceedings of the 33rd Annual Meeting of the ACL, vol. (), 1995.

Online version

The Paper can be found here.

Summary

The Paper addresses the problem of Parsing using unsupervised learning of probabilistic context free grammar. It views the problem of grammar induction as a search problem, with the search space spanning over all the possible grammars. The objective function that is optimized is the a-posterior probability where represents the grammar and represents the training examples. The algorithm prefers smaller grammars by using a universal a-priori probability over the grammar , where is the length of the description of the grammar in bits.

Method

The algorithm starts with a grammar powerful enough to generate any string to ensure that the generation spans the training data. It then takes a Hill Climbing approach by adding a rule that increases the value of the objective function. This is done until there doesn't exist any rule that


Under construction by User:Dkulkarn