Difference between revisions of "Forward-Backward"

Revision as of 10:44, 29 September 2011

Summary

This is a dynamic programming algorithm, used in Hidden Markov Models to efficiently compute the state posteriors over all the hidden state variables.

These values are then used in Posterior Decoding, which simply chooses the state with the highest posterior marginal for each position in the sequence.

The forward-backward algorithm can be computed in linear time, where as, a brute force algorithm that checks all possible state sequences would be exponential over the length of the sequence.

Problem description

Posterior decoding is one of several ways to find the best hidden state sequence ${\hat {y}}$ . It consists in picking the highest state posterior for each position in the sequence:

${\hat {y}}^{*}=argmax_{y_{1},\dots ,y_{N}}\gamma _{i}(y_{i})$

where $\gamma _{i}$ is the state posterior for position $i$ . The state posterior is given by:

$\gamma _{i}(s_{l})=P_{\theta }(y_{i}=s_{1}|{\bar {x}})$

where $P_{\theta }(y_{i}=s_{1}|{\bar {x}})$ is sequence posterior of all possible state sequences where the position $i$ is the state $s_{l}$ .

The sequence posterior $P_{\theta }({\bar {y}}|{\bar {x}})$ for a given sequence ${\bar {y}}$ is defined as:

$P_{\theta }({\bar {y}}|{\bar {x}})={\frac {P_{\theta }({\bar {x}},{\bar {y}})}{\sum _{\bar {y}}P_{\theta }({\bar {x}},{\bar {y}})}}$

$P_{\theta }({\bar {x}},{\bar {y}})$ is calculated as the product of all node potentials $\phi (s_{l})$ of the nodes in the sequence, corresponding to the state observation parameters for the state $s_{l}$ , and the transition potentials $\phi (s_{1},s_{n})$ of the transition in the sequence, corresponding to the translation parameters. These potentials are estimated during the training process of Hidden Markov Models. As an example the sequence in red in the following figure would be calculated as $\phi _{1}(r)\phi _{!}(r,s)\phi _{2}(s)\phi _{2}(s,s)\phi _{3}(s)\phi _{3}(s,s)\phi _{4}(s)$ .

Thus, to calculate all $\gamma _{2}(r)$ , for instance, we would need to sum the sequence posteriors for the sequences r r r r, s r r r, r r s r, s r s r, r r r s, s r r s, r r s s and s r s s. A brute force algorithm would compute the sequence posteriors separately. This would mean that the number of computations to calculate a single potential would be in the order of $(number\ of\ possible\ states)^{(number\ of\ states\ in\ the\ sequence)}$ .

Forward-backward

The Forward-backward algorithm is an dynamic programming algorithm that can compute \gamma for all states in linear time, by exploring the fact that each state is only dependent on the previous state.

It iterates through each node and transition once to build the forward probabilities $\alpha$ and another time to build the backward probabilities $\beta$ .

The forward probability represents the probability that in position $i$ , we are in state $y_{i}=s_{l}$ and that we have observed ${\bar {x}}$ up to that position. The forward probability is defined by the following recurrence:

$\alpha _{1}(s_{l})=\phi _{1}(s_{l})$

$\alpha _{i}(s_{l})=\phi _{i}(s_{l})\sum _{s_{m}\in {S}}\phi _{i-1}(s_{m},s_{l})\alpha _{i-1}(s_{m})$

where S is the set of possible states for each position.

We can see in the example that the forward probability $\alpha _{2}(s)$ corresponds to the sequence posteriors for the partial sequences and end in r in position 2 (s s and r s).

The forward probabilities can be used to calculate the posterior over all possible sequences $\sum _{\bar {y}}P_{\theta }({\bar {x}},{\bar {y}})$ :

$\sum _{\bar {y}}P_{\theta }({\bar {x}},{\bar {y}})=\sum _{s_{m}\in S}\alpha _{N}(s_{m})$

where N is the last position of the sequence.

To calculate the state posterior and transition posterior, we also need to calculate the backward probabilities, which operates in the inverse direction:

$\beta _{N}(s_{l})=1\beta _{i}(s_{l})=\sum _{s_{m}\in {S}}\phi _{i}(s_{l},s_{m})\phi _{i+1}(s_{m})\beta _{i+1}(s_{m})$

@@ Line 69: / Line 69: @@
 <math>
 \beta_N(s_l) = 1
-\beta_i(s_l) = \sum_{s_m\in{S}}\phi_i{s_l,s_m}\phi_{i+1}(s_m)\beta_{i+1}(s_m)
+\beta_i(s_l) = \sum_{s_m\in{S}}\phi_i(s_l,s_m)\phi_{i+1}(s_m)\beta_{i+1}(s_m)
 </math>

Difference between revisions of "Forward-Backward"

Revision as of 10:44, 29 September 2011

Summary

Problem description

Forward-backward

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools