Forward-Backward

Summary

This is a dynamic programming algorithm, used in Hidden Markov Models to efficiently compute the state posteriors over all the hidden state variables.

These values are then used in Posterior Decoding, which simply chooses the state with the highest posterior marginal for each position in the sequence.

The forward-backward algorithm can be computed in linear time, where as, a brute force algorithm that checks all possible state sequences would be exponential over the length of the sequence.

Problem formulation

Posterior Decoding is one of several ways to find the best hidden state sequence ${\hat {y}}$ . It consists in picking the highest state posterior for each position in the sequence:

${\hat {y}}^{*}=argmax_{y_{1},\dots ,y_{N}}\gamma _{i}(y_{i})$

where $\gamma _{i}$ denotes the state posterior for position $i$ . The state posterior is given by:

$\gamma _{i}(s_{l})=P_{\theta }(y_{i}=s_{1}|{\bar {x}})$

where $P_{\theta }(y_{i}=s_{1}|{\bar {x}})$ is sequence posterior of all possible state sequences where the position $i$ is the state $s_{l}$ .

The sequence posterior $P_{\theta }({\bar {y}}|{\bar {x}})$ for a given sequence ${\bar {y}}$ is defined as:

$P_{\theta }({\bar {y}}|{\bar {x}})={\frac {P_{\theta }({\bar {x}},{\bar {y}})}{\sum _{\bar {y}}P_{\theta }({\bar {x}},{\bar {y}})}}$

$P_{\theta }({\bar {x}},{\bar {y}})$ is calculated as the product of all node potentials $\phi (s_{l})$ of the nodes in the sequence, corresponding to the state observation parameters for the state $s_{l}$ , and the transition potentials $\phi (s_{1},s_{n})$ of the transition in the sequence, corresponding to the translation parameters. These potentials are estimated during the training process of Hidden Markov Models. As an example the sequence in red in the following figure would be calculated as $\phi _{1}(r)\phi _{!}(r,s)\phi _{2}(s)\phi _{2}(s,s)\phi _{3}(s)\phi _{3}(s,s)\phi _{4}(s)$ .

Thus, to calculate $\gamma _{2}(r)$ , for instance, we would need to sum the sequence posteriors for the sequences r r r r, s r r r, r r s r, s r s r, r r r s, s r r s, r r s s and s r s s. A brute force algorithm would compute the sequence posteriors separately. This would mean that the number of computations to calculate a single potential would be in the order of $(S)^{N}$ , where S denotes the number of possible states and N is the length of the sequence. Under this approach, this inference would be intractable for real-life problems due to the exponential growth of the number of possible sequences as N and S increase.

Forward-backward

The Forward-backward algorithm is an dynamic programming algorithm that can compute \gamma for all states in linear time, by exploring the fact that each state is only dependent on the previous state.

It iterates through each node and transition once to build the forward probabilities $\alpha$ and another time to build the backward probabilities $\beta$ .

The forward probability represents the probability that in position $i$ , we are in state $y_{i}=s_{l}$ and that we have observed ${\bar {x}}$ up to that position. The forward probability is defined by the following recurrence:

$\alpha _{1}(s_{l})=\phi _{1}(s_{l})$

$\alpha _{i}(s_{l})=\phi _{i}(s_{l})\sum _{s_{m}\in {S}}\phi _{i-1}(s_{m},s_{l})\alpha _{i-1}(s_{m})$

We can see in the example that the forward probability $\alpha _{2}(s)$ corresponds to the sequence posteriors for the partial sequences and end in r in position 2 (s s and r s).

The forward probabilities can be used to calculate the posterior over all possible sequences $\sum _{\bar {y}}P_{\theta }({\bar {x}},{\bar {y}})$ :

$\sum _{\bar {y}}P_{\theta }({\bar {x}},{\bar {y}})=\sum _{s_{k}\in S}\alpha _{N}(s_{k})$

To calculate the state posterior and transition posterior, we also need to calculate the backward probabilities, which operates in the inverse direction:

$\beta _{N}(s_{l})=1$

$\beta _{i}(s_{l})=\sum _{s_{m}\in {S}}\phi _{i}(s_{l},s_{m})\phi _{i+1}(s_{m})\beta _{i+1}(s_{m})$

We can see the the backward probability is defined in a slightly different way, since the backward probability for a given position does not include the state potential for that position, as we can see in the example above.

With the forward and backward probabilities we can find the state posterior for any state by simply calculating:

$\gamma _{i}(s_{l})=P_{\theta }(y_{i}=s_{1}|{\bar {x}})={\frac {\alpha _{i}(s_{l})\beta (s_{l})}{\sum _{s_{k}\in S}\alpha _{N}(s_{k})}}$

we can also calculate the transition posteriors by computing:

$\zeta _{i}(s_{l},s_{m})=P_{\theta }(y_{i}=s_{l},y_{i+1}=s_{m}|{\bar {x}})={\frac {\alpha _{i}(s_{l})\phi _{i}(s_{l},s_{m})\phi _{i+1}(s_{m})\beta _{i+1}(s_{m})}{\sum _{s_{k}\in S}\alpha _{N}(s_{k})}}$

To summarize, the inference using the forward-backward algorithm is done by:

1 - Calculate the forward and backward probabilities for all positions and hidden states
2 - Calculate the state posteriors for all positions and hidden states
3 - Choose the sequence with the highest state posteriors

The most complex task in this algorithm is calculating the forward and backward probabilities, where it has to iterate through all edges between nodes. However, the number of edges in a Hidden Markov Model is in the order of $S^{2}N$ , which is tractable and has a linear growth.

Related Concepts

The Inside-outside algorithm is a generalization of the Forward-backward algorithm.

The Forward-backward algorithm is used for Posterior Decoding in a Hidden Markov Model, where the sequence with the highest state posteriors is chosen. An alternative to this is Viterbi Decoding, where the sequence with the highest sequence posterior is chosen instead. In this case, the Viterbi algorithm is used.

The Forward-backward algorithm is also used in the Baum-Welch algorithm, which is used to find unknown parameters in a Hidden Markov Model.

Forward-Backward

Contents

Summary

Problem formulation

Forward-backward

Related Concepts

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools