Inside Outside algorithm

From Cohen Courses
Jump to: navigation, search

This is a Method page for the Inside-outside algorithm.



The inside-outside algorithm is a way of estimating probabilities in a PCFG. It is first introduced Baker (1979). The inside outside algorithm is in fact a generalization of the forward-backward algorithm (for hidden Markov models) to PCFGs.

It is mainly used for unsupervised parsing. For references to parsing related papers, refer to Class meeting for 10-710


The algorithm is a dynamic programming algorithm that is often used with chart parsers to estimate expected production counts. Here, we assume the grammar G is of Chomsky Normal Form.

The algorithm works by computing 2 probabilities for each nonterminal A and span i, j.

Inside probabilities

The inside probability is defined as \alpha(A, i, j)=P(A\overset{*}{\Rightarrow} w_i...w_j|G, \mathbf{w}), which is the probability of a nonterminal A generating the word sequence w_i to w_j.

The inside probability can be calculated recursively with the following recurrence relation:

\alpha(A,i,j)=\sum_{B,C}\sum_{i\leq k\leq j}p(A\rightarrow B C) \alpha(B,i,k) \alpha(C,k+1,j)

Intuitively, this can be seen as computing the sum over all possible ways of building trees rooted by A and generating the word span i,j.

For the base case, it is simply \alpha(A,i,i)=p(A\rightarrow w_i).

Outside counts

The outside probability is defined as \beta(A,i,j)=P(S\overset{*}{\Rightarrow} w_1, ..., w_{i-1}, A, w_{j+1}, ..., w_n), which is the probability of generating a parse tree spanning the entire sentence that uses nonterminal A to span i,j.

The reccurrence relation is thus:

\beta(A,i,j)=\sum_{B,C}\sum_{1\leq k<i}p(B\rightarrow CA)\alpha(C,k,i-1)\beta(B,k,j) + \sum_{B,C}\sum_{j< k\leq n}p(B\rightarrow AC)\alpha(C,j+1,k)\beta(B,i,k)

The first term is basically considering all ways of generating trees where A is used as a right subtree, and vis a vis for the second term.

Dynamic programming: Putting them together

In a standard EM framework, we would want to compute for each production rule, the expected number of times it is used for a given sentence, which we can compute by summing over the counts of using the production for all possible spans (and separation points)

count(A\rightarrow BC)=\dfrac{p(A\rightarrow BC)}{\alpha(S,1,n)}\sum_{1\leq i \leq j \leq k \leq n} \beta(A,i,k)\alpha(B,i,j)\alpha(C,j+1,k)

The new estimated production probabilities would thus be p^{new}(A\rightarrow BC)=\dfrac{count(A\rightarrow BC)}{\sum_{X,Y}count(A\rightarrow XY)}


For a given sentence of length n and grammar G, the inside outside algorithm is O(Gn^3)


Trainable grammars for speech recognition. J Baker (1979). The original paper introducing the inside-outside algorithm

Notes on Inside-Outside algorithm. Jason Eisner. provides a good walkthrough and explanation on using the inside-outside algorithm with a PCFG

A Derivation of the Inside-Outside Algorithm from the EM Algorithm. John D Lafferty (2000). shows the the inside-outside algorithm as a special case of EM, and proves its convergence.