HMM

Hidden Markov Models (HMMs) are statistical models that are used for representing stochastic Markov processes with hidden states.

The output of the model is observable and dependent on the hidden states. The process is governed by transition probabilities which determine the probability of moving between states, and emission probabilities which determine the output of each state. Therefore, the observed output sequence can provide information as to a probable sequence of hidden states that were used in the generation process.

Description

Let ${\displaystyle H_{t}}$ be a random variable that represents the state at time ${\displaystyle t}$, and let ${\displaystyle X_{t}}$ be a random variable that represents the emission at time ${\displaystyle t}$. Define ${\displaystyle H_{0}}$ as the start state and ${\displaystyle H_{T+1}}$ as the stop state.

If the HMM represents a first-order Markov process then

${\displaystyle p(H_{t}=h_{t}|H_{t-1}=h_{t-1},...,H_{0}=h_{0})=p(H_{t}=h_{t}|H_{t-1}=h_{t-1})}$

Alternatively, the HMM could represent a Markov process of order ${\displaystyle m}$, where

${\displaystyle p(H_{t}=h_{t}|H_{t-1}=h_{t-1},...,H_{0}=h_{0})=p(H_{t}=h_{t}|H_{t-1}=h_{t-1},...,H_{t-m}=h_{t-m})}$

A (first-order) HMM defines a distribution over a sequence of states, ${\displaystyle H}$, and output tokens, ${\displaystyle X}$.

${\displaystyle p(H=h,X=x)=p(H=h)p(X=x|H=h)=\left(\prod _{t=1}^{T+1}p(H_{t}=h_{t}|H_{t-1}=h_{t-1})\right)\left(\prod _{t=1}^{T}p(X_{t}=x_{t}|H_{t}=h_{t})\right)}$

This distribution can be redefined using transition and emission probabilities. Let ${\displaystyle \pi _{h,h'}}$ be a transition probability, the probability of moving from state ${\displaystyle h}$ to state ${\displaystyle h'}$ in the model

${\displaystyle \pi _{h,h'}=p(H_{t}=h'|H_{t-1}=h)}$

And let ${\displaystyle \tau _{h,x}}$ be an emission probability, the probability that state ${\displaystyle h}$ emits the token ${\displaystyle x}$

${\displaystyle \tau _{h,x}=p(X_{t}=x|H_{t}=h)}$

Then

${\displaystyle p(H=h,X=x)=\left(\prod _{t=1}^{T+1}p(H_{t}=h_{t}|H_{t-1}=h_{t-1})\right)\left(\prod _{t=1}^{T}p(X_{t}=x_{t}|H_{t}=h_{t})\right)=\left(\prod _{t=1}^{T+1}\pi _{h_{t-1},h_{t}}\right)\left(\prod _{t=1}^{T}\tau _{h_{t},x_{t}}\right)}$

Training

The training task is defined as finding the emission and transition probabilities of an HMM given some sample set of observations. This is usually done using maximum likelihood estimation algorithms. The Baum-Welch algorithm is a generalized Expectation Maximization algorithm that uses Forward-Backward for recovering the parameters of an HMM.

Inference

Several interesting inference task can be considered for HMMs.

Evaluating the probability of an observed sequence

Given an observed sequence and an HMM model, we can evaluate the probability that the sequence was generated by the model. We can evaluate this, for example, for several different models, and select the one with the highest probability as the recognition model. This probability can be calculated using dynamic programming with the forward algorithm.

Recovering the most probable state sequence

Given an observed sequence, we can find the sequence of states that maximized the joint probability of the observed sequence and the state sequence for that model. This is done with the Viterbi algorithm.