Difference between revisions of "Bayesian grammar induction for language modeling, Chen 1995"

Revision as of 22:39, 2 November 2011

Citation

Bayesian grammar induction for language modeling. By Stanley F. Chen, . In Proceedings of the 33rd Annual Meeting of the ACL, vol. (), 1995.

Online version

The Paper can be found here.

Summary

The Paper addresses the problem of Parsing using unsupervised learning of probabilistic context free grammar. It views the problem of grammar induction as a search problem, with the search space spanning over all the possible grammars. The objective function that is optimized is the a-posterior probability $\arg \max _{G}p(G|O)=\arg \max _{G}p(O|G)p(G)$ where $G$ represents the grammar and $O$ represents the training examples. The algorithm prefers smaller grammars by using a universal a-priori probability over the grammar $G$ , $p(G)=2^{-l(G)}$ where $l(G)$ is the length of the description of the grammar in bits.

Method

The algorithm starts with a grammar powerful enough to generate any string to ensure that the generation spans the training data. It then takes a Hill Climbing approach by adding a rule that increases the value of the objective function. This is done until there doesn't exist any rule that

Under construction by User:Dkulkarn

@@ Line 8: / Line 8: @@
 == Summary ==
-The [[Category::Paper]] addresses the problem of unsupervised learning of probabilistic context free grammar ([[AddressesProblem::Parsing]]).
+The [[Category::Paper]] addresses the problem of [[AddressesProblem::Parsing]] using unsupervised learning of probabilistic context free grammar. It views the problem of grammar induction as a [[UsesMethod::Searching | search]] problem, with the search space spanning over all the possible grammars. The objective function that is optimized is the a-posterior probability <math> \arg \max_G p(G | O) = \arg \max_G p(O|G) p(G)</math> where <math>G</math> represents the grammar and <math>O</math> represents the training examples. The algorithm prefers smaller grammars by using a universal a-priori probability over the grammar <math>G</math>, <math>p(G) = 2^{-l(G)}</math> where <math>l(G)</math> is the length of the description of the grammar in bits.
+== Method ==
+The algorithm starts with a grammar powerful enough to generate any string to ensure that the generation spans the training data. It then takes a [[UsesMethod::Hill Climbing]] approach by adding a rule that increases the value of the objective function. This is done until there doesn't exist any rule that
 Under construction by [[User:Dkulkarn]]

Difference between revisions of "Bayesian grammar induction for language modeling, Chen 1995"

Revision as of 22:39, 2 November 2011

Contents

Citation

Online version

Summary

Method

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools