PCFGs

PCFGs are a generative model used in NLP for parsing modeling and syntax analysis. Using this model, data is generated in a branching process where each non-terminal state produces a sequence of states that are visited in a recursive manner. PCFGs are used in a variety of NLP tasks including: grammar checking, lexical learning and machine translation.

Short definition of CFGs

Let $\Sigma$ be an alphabet of observable symbols, called terminals.
Let $N$ be a set of variable symbols, called non-terminals, including a special start symbols, $S\in N$ .
The grammar also defines a set of derivation rules, $R$ , for transforming terminal symbols into sequences of terminals and non-terminals. The rules are of the form $X\rightarrow \alpha$ , where $X\in N$ and $\alpha \in \{N\cup \Sigma \}^{*}$ .

Description of PCFGs

PCFGs extend CFGs by defining a multinomial distribution over the set of derivation rules over each terminal symbol.

Let $R_{X}$ be the set of derivation rules for the non-terminal $X$ (where $X$ is on the left-hand side of the rule).
Let $\theta _{X}$ $\theta _{X}$ be the multinomial distribution defined over $R_{X}$ $R_{X}$ such that
- $\forall r=X\rightarrow \alpha \ \in R_{X},\ \theta _{r}\geq 0$
- $\sum _{r\in R_{X}}\theta _{r}=1$

The Generative Process

The process begins with the special start symbol $S\in N$ . At every consecutive step, if there are any non-terminal symbols left in the generated sequence, a derivation rule is sampled from the distribution of rules for that non-terminal and it is replaced by the the right-hand of the rule which contains some combination of terminal and non-terminal symbols.

The process ends when there are no more non-terminal symbols in the generated sequence.

This generates a sequence of terminals from $\Sigma ^{*}$ . The probability of generating a sequence given a certain derivation tree is simply the product of the probabilities used in that tree.

Decoding

Decoding PCFGs is done with variants of the Earley and CYK algorithms.

Relation to HMMs

HMMs are a special type of PCFGs. We can rewrite an HMM as a PCFG as follows:

Let $N$ , the set of non-terminals, be the set of states $H$ of the HMM, including start ( $S\in H$ ) and end ( $E\in H$ ) states.
Let $\pi _{h,h'}$ be the transition probability from state $h$ to state $h'$ , and let $\tau _{h,x}$ be the emission probability of symbol $x$ in state $h$ .

We can now define the following derivation rules:

$h\rightarrow \tau _{h,x}\cdot \pi _{h,h'}\cdot x\cdot h'$ , where $x\in \Sigma$ , </math>h, h' \in H</math>
$S\rightarrow \pi _{S,h}\cdot h$ , for </math>h \in H</math>
$E\rightarrow 1\cdot \epsilon$