Hidden Markov Models (HMMs) are statistical models that are used for representing stochastic Markov processes with hidden states.
The output of the model is observable and dependent on the hidden states. The process is governed by transition probabilities which determine the probability of moving between states, and emission probabilities which determine the output of each state. Therefore, the observed output sequence can provide information as to a probable sequence of hidden states that were used in the generation process.
Let be a random variable that represents the state at time , and let be a random variable that represents the emission at time . Define as the start state and as the stop state.
If the HMM represents a first-order Markov process then
Alternatively, the HMM could represent a Markov process of order , where
A (first-order) HMM defines a distribution over a sequence of states, , and output tokens, .
This distribution can be redefined using transition and emission probabilities. Let be a transition probability, the probability of moving from state to state in the model
And let be an emission probability, the probability that state emits the token
The training task is defined as finding the emission and transition probabilities of an HMM given some sample set of observations. This is usually done using maximum likelihood estimation algorithms. The Baum-Welch algorithm is a generalized Expectation Maximization algorithm that uses Forward-Backward for recovering the parameters of an HMM.
Several interesting inference task can be considered for HMMs.
Evaluating the probability of an observed sequence
Given an observed sequence and an HMM model, we can evaluate the probability that the sequence was generated by the model. We can evaluate this, for example, for several different models, and select the one with the highest probability as the recognition model. This probability can be calculated using dynamic programming with the forward algorithm.
Recovering the most probable state sequence
Given an observed sequence, we can find the sequence of states that maximized the joint probability of the observed sequence and the state sequence for that model. This is done with the Viterbi algorithm.