Forward-Backward
Summary
This is a dynamic programming algorithm, used in Hidden Markov Models to efficiently compute the state posteriors over all the hidden state variables.
These values are then used in Posterior Decoding, which simply chooses the state with the highest posterior marginal for each position in the sequence.
The forward-backward algorithm can be computed in linear time, where as, a brute force algorithm that checks all possible state sequences would be exponential over the length of the sequence.
Problem formulation
Posterior decoding is one of several ways to find the best hidden state sequence . It consists in picking the highest state posterior for each position in the sequence:
where is the state posterior for position . The state posterior is given by:
where is sequence posterior of all possible state sequences where the position is the state .
The sequence posterior for a given sequence is defined as:
is calculated as the product of all node potentials of the nodes in the sequence, corresponding to the state observation parameters for the state , and the transition potentials of the transition in the sequence, corresponding to the translation parameters. These potentials are estimated during the training process of Hidden Markov Models. As an example the sequence in red in the following figure would be calculated as .
Thus, to calculate , for instance, we would need to sum the sequence posteriors for the sequences r r r r, s r r r, r r s r, s r s r, r r r s, s r r s, r r s s and s r s s. A brute force algorithm would compute the sequence posteriors separately. This would mean that the number of computations to calculate a single potential would be in the order of , where S is the number of possible states and N is the length of the sequence. Under this approach, this inference would be intractable for real-life problems due to the exponential growth of the number of possible sequences as N and S increase.
Forward-backward
The Forward-backward algorithm is an dynamic programming algorithm that can compute \gamma for all states in linear time, by exploring the fact that each state is only dependent on the previous state. It is a specialized version of the Inside-outside algorithm for Hidden Markov Models.
It iterates through each node and transition once to build the forward probabilities and another time to build the backward probabilities .
The forward probability represents the probability that in position , we are in state and that we have observed up to that position. The forward probability is defined by the following recurrence:
We can see in the example that the forward probability corresponds to the sequence posteriors for the partial sequences and end in r in position 2 (s s and r s).
The forward probabilities can be used to calculate the posterior over all possible sequences :
To calculate the state posterior and transition posterior, we also need to calculate the backward probabilities, which operates in the inverse direction:
We can see the the backward probability is defined in a slightly different way, since the backward probability for a given position does not include the state potential for that position, as we can see in the example above.
With the forward and backward probabilities we can find the state posterior for any state by simply calculating:
we can also calculate the transition posteriors by computing:
To summarize, the inference using the forward-backward algorithm is done by:
- 1 - Calculate the forward and backward probabilities for all positions and hidden states
- 2 - Calculate the state posteriors for all positions and hidden states
- 3 - Choose the sequence with the highest state posteriors
The most complex task in this algorithm is calculating the forward and backward probabilities, where it has to iterate through all edges between nodes. However, the number of edges in a Hidden Markov Model is in the order of , which is tractable and has a linear growth.