Difference between revisions of "Inside Outside algorithm"

From Cohen Courses
Jump to navigationJump to search
m
m
 
(7 intermediate revisions by the same user not shown)
Line 3: Line 3:
 
== Background ==
 
== Background ==
  
The inside-outside algorithm is a way of estimating probabilities in a PCFG. It is first introduced [[http://asadl.org/jasa/resource/1/jasman/v65/iS1/pS132_s1 | Baker, 1979]].  The inside outside algorithm is in fact a generalization of the forward-backward algorithm (for hidden Markov models) to PCFGs.
+
The inside-outside algorithm is a way of estimating probabilities in a [[PCFG]]. It is first introduced [http://asadl.org/jasa/resource/1/jasman/v65/iS1/pS132_s1 Baker (1979)].  The inside outside algorithm is in fact a generalization of the [[forward-backward]] algorithm (for [[hidden Markov models]]) to PCFGs.
 +
 
 +
It is mainly used for unsupervised [[AddressesProblem::parsing]]. For references to parsing related papers, refer to [[RelatedPaper::Class_Meeting_for_10-710_10-11-2011 | Class meeting for 10-710]]
  
 
== Algorithm ==
 
== Algorithm ==
Line 38: Line 40:
  
 
<math>count(A\rightarrow BC)=\dfrac{p(A\rightarrow BC)}{\alpha(S,1,n)}\sum_{1\leq i \leq j \leq k \leq n} \beta(A,i,k)\alpha(B,i,j)\alpha(C,j+1,k)</math>
 
<math>count(A\rightarrow BC)=\dfrac{p(A\rightarrow BC)}{\alpha(S,1,n)}\sum_{1\leq i \leq j \leq k \leq n} \beta(A,i,k)\alpha(B,i,j)\alpha(C,j+1,k)</math>
 +
 +
The new estimated production probabilities would thus be <math>p^{new}(A\rightarrow BC)=\dfrac{count(A\rightarrow BC)}{\sum_{X,Y}count(A\rightarrow XY)}</math>
  
 
== Complexity ==
 
== Complexity ==
Line 49: Line 53:
 
[http://www.cs.jhu.edu/~jason/465/iobasics.pdf Notes on Inside-Outside algorithm. Jason Eisner.] provides a good walkthrough and explanation on using the inside-outside algorithm with a PCFG
 
[http://www.cs.jhu.edu/~jason/465/iobasics.pdf Notes on Inside-Outside algorithm. Jason Eisner.] provides a good walkthrough and explanation on using the inside-outside algorithm with a PCFG
  
[http://domino.watson.ibm.com/library/cyberdig.nsf/papers/C13E80CEF63EB770852576B6004852B4/$File/rc21636.pdf A Derivation of the Inside-Outside Algorithm from the EM Algorithm. John D Lafferty (2000).] shows the the inside-outside algorithm as a special case of EM.
+
[http://domino.watson.ibm.com/library/cyberdig.nsf/papers/C13E80CEF63EB770852576B6004852B4/$File/rc21636.pdf A Derivation of the Inside-Outside Algorithm from the EM Algorithm. John D Lafferty (2000).] shows the the inside-outside algorithm as a special case of EM, and proves its convergence.

Latest revision as of 12:27, 29 November 2011

This is a Method page for the Inside-outside algorithm.

Background

The inside-outside algorithm is a way of estimating probabilities in a PCFG. It is first introduced Baker (1979). The inside outside algorithm is in fact a generalization of the forward-backward algorithm (for hidden Markov models) to PCFGs.

It is mainly used for unsupervised parsing. For references to parsing related papers, refer to Class meeting for 10-710

Algorithm

The algorithm is a dynamic programming algorithm that is often used with chart parsers to estimate expected production counts. Here, we assume the grammar is of Chomsky Normal Form.

The algorithm works by computing 2 probabilities for each nonterminal and span .

Inside probabilities

The inside probability is defined as , which is the probability of a nonterminal generating the word sequence to .

The inside probability can be calculated recursively with the following recurrence relation:

Intuitively, this can be seen as computing the sum over all possible ways of building trees rooted by and generating the word span .

For the base case, it is simply .

Outside counts

The outside probability is defined as , which is the probability of generating a parse tree spanning the entire sentence that uses nonterminal to span .

The reccurrence relation is thus:

The first term is basically considering all ways of generating trees where is used as a right subtree, and vis a vis for the second term.

Dynamic programming: Putting them together

In a standard EM framework, we would want to compute for each production rule, the expected number of times it is used for a given sentence, which we can compute by summing over the counts of using the production for all possible spans (and separation points)

The new estimated production probabilities would thus be

Complexity

For a given sentence of length and grammar , the inside outside algorithm is

References

Trainable grammars for speech recognition. J Baker (1979). The original paper introducing the inside-outside algorithm

Notes on Inside-Outside algorithm. Jason Eisner. provides a good walkthrough and explanation on using the inside-outside algorithm with a PCFG

A Derivation of the Inside-Outside Algorithm from the EM Algorithm. John D Lafferty (2000). shows the the inside-outside algorithm as a special case of EM, and proves its convergence.