# PCFGs

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

PCFGs are a generative model used in NLP for parsing modeling and syntax analysis. Using this model, data is generated in a branching process where each non-terminal state produces a sequence of states that are visited in a recursive manner. PCFGs are used in a variety of NLP tasks including: grammar checking, lexical learning and machine translation.

## Short definition of CFGs

• Let ${\displaystyle \Sigma }$ be an alphabet of observable symbols, called terminals.
• Let ${\displaystyle N}$ be a set of variable symbols, called non-terminals, including a special start symbols, ${\displaystyle S\in N}$.
• The grammar also defines a set of derivation rules, ${\displaystyle R}$, for transforming terminal symbols into sequences of terminals and non-terminals. The rules are of the form ${\displaystyle X\rightarrow \alpha }$, where ${\displaystyle X\in N}$ and ${\displaystyle \alpha \in \{N\cup \Sigma \}^{*}}$.

## Description of PCFGs

PCFGs extend CFGs by defining a multinomial distribution over the set of derivation rules over each terminal symbol.

• Let ${\displaystyle R_{X}}$ be the set of derivation rules for the non-terminal ${\displaystyle X}$ (where ${\displaystyle X}$ is on the left-hand side of the rule).
• Let ${\displaystyle \theta _{X}}$ be the multinomial distribution defined over ${\displaystyle R_{X}}$ such that
• ${\displaystyle \forall r=X\rightarrow \alpha \ \in R_{X},\ \theta _{r}\geq 0}$
• ${\displaystyle \sum _{r\in R_{X}}\theta _{r}=1}$

## The Generative Process

The process begins with the special start symbol ${\displaystyle S\in N}$. At every consecutive step, if there are any non-terminal symbols left in the generated sequence, a derivation rule is sampled from the distribution of rules for that non-terminal and it is replaced by the the right-hand of the rule which contains some combination of terminal and non-terminal symbols.

The process ends when there are no more non-terminal symbols in the generated sequence.

This generates a sequence of terminals from ${\displaystyle \Sigma ^{*}}$. The probability of generating a sequence given a certain derivation tree is simply the product of the probabilities used in that tree.

## Decoding

Decoding PCFGs is done with variants of the Earley and CYK algorithms.

## Relation to HMMs

HMMs are a special type of PCFGs. We can rewrite an HMM as a PCFG as follows:

• Let ${\displaystyle N}$, the set of non-terminals, be the set of states ${\displaystyle H}$ of the HMM, including start (${\displaystyle S\in H}$) and end (${\displaystyle E\in H}$) states.
• Let ${\displaystyle \pi _{h,h'}}$ be the transition probability from state ${\displaystyle h}$ to state ${\displaystyle h'}$, and let ${\displaystyle \tau _{h,x}}$ be the emission probability of symbol ${\displaystyle x}$ in state ${\displaystyle h}$.

We can now define the following derivation rules:

• ${\displaystyle h\rightarrow \tau _{h,x}\cdot \pi _{h,h'}\cdot x\cdot h'}$, where ${\displaystyle x\in \Sigma }$, [/itex]h, h' \in H[/itex]
• ${\displaystyle S\rightarrow \pi _{S,h}\cdot h}$, for [/itex]h \in H[/itex]
• ${\displaystyle E\rightarrow 1\cdot \epsilon }$