Difference between revisions of "Inside Outside algorithm"

From Cohen Courses
Jump to navigationJump to search
m
m
Line 21: Line 21:
 
<math>\alpha(A,i,j)=\sum_{B,C}\sum_{i\leq k\leq j}p(A\rightarrow B C) \alpha(B,i,k) \alpha(C,k+1,j)</math>
 
<math>\alpha(A,i,j)=\sum_{B,C}\sum_{i\leq k\leq j}p(A\rightarrow B C) \alpha(B,i,k) \alpha(C,k+1,j)</math>
  
 +
Intuitively, this can be seen as computing the sum over all possible ways of building trees rooted by <math>A</math> and generating the word span <math>i,j</math>.
 +
 +
For the base case, it is simply <math>\alpha(A,i,i)=p(A\rightarrow w_i)</math>.
  
 
=== Outside counts ===
 
=== Outside counts ===
 +
 +
The outside probability is defined as
 +
 +
<math>\beta(A,i,j)=\sum_{B,C}\sum{1\leq k<i}p(B\rightarrow CA}\alpha(C,k,i-1)\beta(B,k,j) + \sum_{B,C}\sum{j< k\leq n}p(B\rightarrow AC}\alpha(C,j+1,k)\beta(B,i,k)</math>
 +
 +
=== Putting them together ===

Revision as of 12:49, 29 November 2011

This is a Method page for the Inside-outside algorithm.

Background

The inside-outside algorithm is a way of estimating probabilities in a PCFG. It is first introduced [| Baker, 1979]. The inside outside algorithm is in fact a generalization of the forward-backward algorithm (for hidden Markov models) to PCFGs.

It is often used as part of the EM algorithm for computing expectations.

Algorithm

The algorithm is a dynamic programming algorithm that is often used with chart parsers to estimate expected production counts. Here, we assume the grammar is of Chomsky Normal Form.

The algorithm works by computing 2 probabilities for each nonterminal and span .

Inside probabilities

The inside probability is defined as , which is the probability of a nonterminal generating the word sequence to .

The inside probability can be calculated recursively with the following recurrence relation:

Intuitively, this can be seen as computing the sum over all possible ways of building trees rooted by and generating the word span .

For the base case, it is simply .

Outside counts

The outside probability is defined as

Failed to parse (syntax error): {\displaystyle \beta(A,i,j)=\sum_{B,C}\sum{1\leq k<i}p(B\rightarrow CA}\alpha(C,k,i-1)\beta(B,k,j) + \sum_{B,C}\sum{j< k\leq n}p(B\rightarrow AC}\alpha(C,j+1,k)\beta(B,i,k)}

Putting them together