Difference between revisions of "Inside Outside algorithm"
m (Blanked the page) |
m |
||
(31 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
+ | This is a [[Category::Method]] page for the Inside-outside algorithm. | ||
+ | == Background == | ||
+ | |||
+ | The inside-outside algorithm is a way of estimating probabilities in a [[PCFG]]. It is first introduced [http://asadl.org/jasa/resource/1/jasman/v65/iS1/pS132_s1 Baker (1979)]. The inside outside algorithm is in fact a generalization of the [[forward-backward]] algorithm (for [[hidden Markov models]]) to PCFGs. | ||
+ | |||
+ | It is mainly used for unsupervised [[AddressesProblem::parsing]]. For references to parsing related papers, refer to [[RelatedPaper::Class_Meeting_for_10-710_10-11-2011 | Class meeting for 10-710]] | ||
+ | |||
+ | == Algorithm == | ||
+ | |||
+ | The algorithm is a dynamic programming algorithm that is often used with chart parsers to estimate expected production counts. Here, we assume the grammar <math>G</math> is of Chomsky Normal Form. | ||
+ | |||
+ | The algorithm works by computing 2 probabilities for each nonterminal <math>A</math> and span <math>i, j</math>. | ||
+ | |||
+ | === Inside probabilities === | ||
+ | |||
+ | The inside probability is defined as <math>\alpha(A, i, j)=P(A\overset{*}{\Rightarrow} w_i...w_j|G, \mathbf{w})</math>, which is the probability of a nonterminal <math>A</math> generating the word sequence <math>w_i</math> to <math>w_j</math>. | ||
+ | |||
+ | The inside probability can be calculated recursively with the following recurrence relation: | ||
+ | |||
+ | <math>\alpha(A,i,j)=\sum_{B,C}\sum_{i\leq k\leq j}p(A\rightarrow B C) \alpha(B,i,k) \alpha(C,k+1,j)</math> | ||
+ | |||
+ | Intuitively, this can be seen as computing the sum over all possible ways of building trees rooted by <math>A</math> and generating the word span <math>i,j</math>. | ||
+ | |||
+ | For the base case, it is simply <math>\alpha(A,i,i)=p(A\rightarrow w_i)</math>. | ||
+ | |||
+ | === Outside counts === | ||
+ | |||
+ | The outside probability is defined as <math>\beta(A,i,j)=P(S\overset{*}{\Rightarrow} w_1, ..., w_{i-1}, A, w_{j+1}, ..., w_n)</math>, which is the probability of generating a parse tree spanning the entire sentence that uses nonterminal <math>A</math> to span <math>i,j</math>. | ||
+ | |||
+ | The reccurrence relation is thus: | ||
+ | |||
+ | <math>\beta(A,i,j)=\sum_{B,C}\sum_{1\leq k<i}p(B\rightarrow CA)\alpha(C,k,i-1)\beta(B,k,j) + \sum_{B,C}\sum_{j< k\leq n}p(B\rightarrow AC)\alpha(C,j+1,k)\beta(B,i,k)</math> | ||
+ | |||
+ | The first term is basically considering all ways of generating trees where <math>A</math> is used as a right subtree, and vis a vis for the second term. | ||
+ | |||
+ | === Dynamic programming: Putting them together === | ||
+ | |||
+ | In a standard EM framework, we would want to compute for each production rule, the expected number of times it is used for a given sentence, which we can compute by summing over the counts of using the production for all possible spans (and separation points) | ||
+ | |||
+ | <math>count(A\rightarrow BC)=\dfrac{p(A\rightarrow BC)}{\alpha(S,1,n)}\sum_{1\leq i \leq j \leq k \leq n} \beta(A,i,k)\alpha(B,i,j)\alpha(C,j+1,k)</math> | ||
+ | |||
+ | The new estimated production probabilities would thus be <math>p^{new}(A\rightarrow BC)=\dfrac{count(A\rightarrow BC)}{\sum_{X,Y}count(A\rightarrow XY)}</math> | ||
+ | |||
+ | == Complexity == | ||
+ | |||
+ | For a given sentence of length <math>n</math> and grammar <math>G</math>, the inside outside algorithm is <math>O(Gn^3)</math> | ||
+ | |||
+ | == References == | ||
+ | |||
+ | [http://asadl.org/jasa/resource/1/jasman/v65/iS1/pS132_s1 Trainable grammars for speech recognition. J Baker (1979).] The original paper introducing the inside-outside algorithm | ||
+ | |||
+ | [http://www.cs.jhu.edu/~jason/465/iobasics.pdf Notes on Inside-Outside algorithm. Jason Eisner.] provides a good walkthrough and explanation on using the inside-outside algorithm with a PCFG | ||
+ | |||
+ | [http://domino.watson.ibm.com/library/cyberdig.nsf/papers/C13E80CEF63EB770852576B6004852B4/$File/rc21636.pdf A Derivation of the Inside-Outside Algorithm from the EM Algorithm. John D Lafferty (2000).] shows the the inside-outside algorithm as a special case of EM, and proves its convergence. |
Latest revision as of 12:27, 29 November 2011
This is a Method page for the Inside-outside algorithm.
Contents
Background
The inside-outside algorithm is a way of estimating probabilities in a PCFG. It is first introduced Baker (1979). The inside outside algorithm is in fact a generalization of the forward-backward algorithm (for hidden Markov models) to PCFGs.
It is mainly used for unsupervised parsing. For references to parsing related papers, refer to Class meeting for 10-710
Algorithm
The algorithm is a dynamic programming algorithm that is often used with chart parsers to estimate expected production counts. Here, we assume the grammar is of Chomsky Normal Form.
The algorithm works by computing 2 probabilities for each nonterminal and span .
Inside probabilities
The inside probability is defined as , which is the probability of a nonterminal generating the word sequence to .
The inside probability can be calculated recursively with the following recurrence relation:
Intuitively, this can be seen as computing the sum over all possible ways of building trees rooted by and generating the word span .
For the base case, it is simply .
Outside counts
The outside probability is defined as , which is the probability of generating a parse tree spanning the entire sentence that uses nonterminal to span .
The reccurrence relation is thus:
The first term is basically considering all ways of generating trees where is used as a right subtree, and vis a vis for the second term.
Dynamic programming: Putting them together
In a standard EM framework, we would want to compute for each production rule, the expected number of times it is used for a given sentence, which we can compute by summing over the counts of using the production for all possible spans (and separation points)
The new estimated production probabilities would thus be
Complexity
For a given sentence of length and grammar , the inside outside algorithm is
References
Trainable grammars for speech recognition. J Baker (1979). The original paper introducing the inside-outside algorithm
Notes on Inside-Outside algorithm. Jason Eisner. provides a good walkthrough and explanation on using the inside-outside algorithm with a PCFG
A Derivation of the Inside-Outside Algorithm from the EM Algorithm. John D Lafferty (2000). shows the the inside-outside algorithm as a special case of EM, and proves its convergence.