From Cohen Courses
Jump to navigationJump to search

PCFGs are a generative model used in NLP for parsing modeling and syntax analysis. Using this model, data is generated in a branching process where each non-terminal state produces a sequence of states that are visited in a recursive manner. PCFGs are used in a variety of NLP tasks including: grammar checking, lexical learning and machine translation.

Short definition of CFGs

  • Let be an alphabet of observable symbols, called terminals.
  • Let be a set of variable symbols, called non-terminals, including a special start symbols, .
  • The grammar also defines a set of derivation rules, , for transforming terminal symbols into sequences of terminals and non-terminals. The rules are of the form , where and .

Description of PCFGs

PCFGs extend CFGs by defining a multinomial distribution over the set of derivation rules over each terminal symbol.

  • Let be the set of derivation rules for the non-terminal (where is on the left-hand side of the rule).
  • Let be the multinomial distribution defined over such that

The Generative Process

The process begins with the special start symbol . At every consecutive step, if there are any non-terminal symbols left in the generated sequence, a derivation rule is sampled from the distribution of rules for that non-terminal and it is replaced by the the right-hand of the rule which contains some combination of terminal and non-terminal symbols.

The process ends when there are no more non-terminal symbols in the generated sequence.

This generates a sequence of terminals from . The probability of generating a sequence given a certain derivation tree is simply the product of the probabilities used in that tree.


Decoding PCFGs is done with variants of the Earley and CYK algorithms.

Relation to HMMs

HMMs are a special type of PCFGs. We can rewrite an HMM as a PCFG as follows:

  • Let , the set of non-terminals, be the set of states of the HMM, including start () and end () states.
  • Let be the transition probability from state to state , and let be the emission probability of symbol in state .

We can now define the following derivation rules:

  • , where , </math>h, h' \in H</math>
  • , for </math>h \in H</math>

Related Work