# Inside Outside algorithm

This is a Method page for the Inside-outside algorithm.

## Background

The inside-outside algorithm is a way of estimating probabilities in a PCFG. It is first introduced Baker (1979). The inside outside algorithm is in fact a generalization of the forward-backward algorithm (for hidden Markov models) to PCFGs.

It is mainly used for unsupervised parsing. For references to parsing related papers, refer to Class meeting for 10-710

## Algorithm

The algorithm is a dynamic programming algorithm that is often used with chart parsers to estimate expected production counts. Here, we assume the grammar ${\displaystyle G}$ is of Chomsky Normal Form.

The algorithm works by computing 2 probabilities for each nonterminal ${\displaystyle A}$ and span ${\displaystyle i,j}$.

### Inside probabilities

The inside probability is defined as ${\displaystyle \alpha (A,i,j)=P(A{\overset {*}{\Rightarrow }}w_{i}...w_{j}|G,\mathbf {w} )}$, which is the probability of a nonterminal ${\displaystyle A}$ generating the word sequence ${\displaystyle w_{i}}$ to ${\displaystyle w_{j}}$.

The inside probability can be calculated recursively with the following recurrence relation:

${\displaystyle \alpha (A,i,j)=\sum _{B,C}\sum _{i\leq k\leq j}p(A\rightarrow BC)\alpha (B,i,k)\alpha (C,k+1,j)}$

Intuitively, this can be seen as computing the sum over all possible ways of building trees rooted by ${\displaystyle A}$ and generating the word span ${\displaystyle i,j}$.

For the base case, it is simply ${\displaystyle \alpha (A,i,i)=p(A\rightarrow w_{i})}$.

### Outside counts

The outside probability is defined as ${\displaystyle \beta (A,i,j)=P(S{\overset {*}{\Rightarrow }}w_{1},...,w_{i-1},A,w_{j+1},...,w_{n})}$, which is the probability of generating a parse tree spanning the entire sentence that uses nonterminal ${\displaystyle A}$ to span ${\displaystyle i,j}$.

The reccurrence relation is thus:

${\displaystyle \beta (A,i,j)=\sum _{B,C}\sum _{1\leq k

The first term is basically considering all ways of generating trees where ${\displaystyle A}$ is used as a right subtree, and vis a vis for the second term.

### Dynamic programming: Putting them together

In a standard EM framework, we would want to compute for each production rule, the expected number of times it is used for a given sentence, which we can compute by summing over the counts of using the production for all possible spans (and separation points)

${\displaystyle count(A\rightarrow BC)={\dfrac {p(A\rightarrow BC)}{\alpha (S,1,n)}}\sum _{1\leq i\leq j\leq k\leq n}\beta (A,i,k)\alpha (B,i,j)\alpha (C,j+1,k)}$

The new estimated production probabilities would thus be ${\displaystyle p^{new}(A\rightarrow BC)={\dfrac {count(A\rightarrow BC)}{\sum _{X,Y}count(A\rightarrow XY)}}}$

## Complexity

For a given sentence of length ${\displaystyle n}$ and grammar ${\displaystyle G}$, the inside outside algorithm is ${\displaystyle O(Gn^{3})}$

## References

Trainable grammars for speech recognition. J Baker (1979). The original paper introducing the inside-outside algorithm

Notes on Inside-Outside algorithm. Jason Eisner. provides a good walkthrough and explanation on using the inside-outside algorithm with a PCFG

A Derivation of the Inside-Outside Algorithm from the EM Algorithm. John D Lafferty (2000). shows the the inside-outside algorithm as a special case of EM, and proves its convergence.