From Cohen Courses
Jump to navigationJump to search

Hidden Markov Models (HMMs) are statistical models that are used for representing stochastic Markov processes with hidden states.

The output of the model is observable and dependent on the hidden states. The process is governed by transition probabilities which determine the probability of moving between states, and emission probabilities which determine the output of each state. Therefore, the observed output sequence can provide information as to a probable sequence of hidden states that were used in the generation process.


Let be a random variable that represents the state at time , and let be a random variable that represents the emission at time . Define as the start state and as the stop state.

If the HMM represents a first-order Markov process then

Alternatively, the HMM could represent a Markov process of order , where

A (first-order) HMM defines a distribution over a sequence of states, , and output tokens, .

This distribution can be redefined using transition and emission probabilities. Let be a transition probability, the probability of moving from state to state in the model

And let be an emission probability, the probability that state emits the token



The training task is defined as finding the emission and transition probabilities of an HMM given some sample set of observations. This is usually done using maximum likelihood estimation algorithms. The Baum-Welch algorithm is a generalized Expectation Maximization algorithm that uses Forward-Backward for recovering the parameters of an HMM.


Several interesting inference task can be considered for HMMs.

Evaluating the probability of an observed sequence

Given an observed sequence and an HMM model, we can evaluate the probability that the sequence was generated by the model. We can evaluate this, for example, for several different models, and select the one with the highest probability as the recognition model. This probability can be calculated using dynamic programming with the forward algorithm.

Recovering the most probable state sequence

Given an observed sequence, we can find the sequence of states that maximized the joint probability of the observed sequence and the state sequence for that model. This is done with the Viterbi algorithm.

Relevant Papers