Difference between revisions of "HMM"

From Cohen Courses
Jump to navigationJump to search
 
(19 intermediate revisions by 3 users not shown)
Line 1: Line 1:
... In process of being edited by [[User:Dmovshov|Dana Movshovitz-Attias]] ...
+
Hidden Markov Models (HMMs) are statistical models that are used for representing stochastic [http://en.wikipedia.org/wiki/Markov_process Markov processes] with hidden states.
  
[[Category:: Method | ]]
+
The output of the model is observable and dependent on the hidden states. The process is governed by '''transition probabilities''' which determine the probability of moving between states, and '''emission probabilities''' which determine the output of each state. Therefore, the observed output sequence can provide information as to a probable sequence of hidden states that were used in the generation process.
  
Hidden Markov Models (HMMs) is a statistical Markov model in which the system being modeled is assumed to be a Markov process with unobserved (hidden) states.  
+
== Description ==
 +
Let <math>H_t</math> be a random variable that represents the state at time <math>t</math>, and let <math>X_t</math> be a random variable that represents the emission at time <math>t</math>. Define <math>H_0</math> as the start state and <math>H_{T+1}</math> as the stop state.
  
[http://en.wikipedia.org/wiki/Hidden_Markov_model External Link]
+
If the HMM represents a first-order Markov process then
 +
:<math> p(H_t = h_t | H_{t-1} = h_{t-1}, ..., H_0 = h_0) = p(H_t = h_t | H_{t-1} = h_{t-1}) </math>
 +
Alternatively, the HMM could represent a Markov process of order <math>m</math>, where
 +
:<math> p(H_t = h_t | H_{t-1} = h_{t-1}, ..., H_0 = h_0) = p(H_t = h_t | H_{t-1} = h_{t-1}, ..., H_{t-m} = h_{t-m}) </math>
 +
 
 +
 
 +
A (first-order) HMM defines a distribution over a sequence of states, <math>H</math>, and output tokens, <math>X</math>.
 +
:<math> p(H=h, X=x) = p(H=h)p(X=x|H=h) = \left( \prod_{t=1}^{T+1} p(H_t = h_t | H_{t-1} = h_{t-1}) \right)\left( \prod_{t=1}^{T} p(X_t = x_t | H_t = h_t) \right) </math>
 +
 
 +
This distribution can be redefined using ''transition'' and ''emission'' probabilities.
 +
Let <math>\pi_{h,h'}</math> be a '''transition probability''', the probability of moving from state <math>h</math> to state <math>h'</math> in the model
 +
:<math>\pi_{h,h'} = p(H_t = h' | H_{t-1} = h)</math>
 +
And let <math>\tau_{h,x}</math> be an '''emission probability''', the probability that state <math>h</math> emits the token <math>x</math>
 +
:<math>\tau_{h,x} = p(X_t = x | H_t = h) </math>
 +
Then
 +
:<math> p(H=h, X=x) = \left( \prod_{t=1}^{T+1} p(H_t = h_t | H_{t-1} = h_{t-1}) \right)\left( \prod_{t=1}^{T} p(X_t = x_t | H_t = h_t) \right) = \left( \prod_{t=1}^{T+1} \pi_{h_{t-1},h_t} \right)\left( \prod_{t=1}^{T} \tau_{h_t,x_t} \right)</math>
 +
 
 +
== Training ==
 +
The training task is defined as finding the emission and transition probabilities of an HMM given some sample set of observations.  This is usually done using maximum likelihood estimation algorithms. The [http://en.wikipedia.org/wiki/Baum%E2%80%93Welch_algorithm Baum-Welch algorithm] is a generalized [[UsesMethod::Expectation_Maximization | Expectation Maximization]] algorithm that uses [[UsesMethod::Forward-Backward]] for recovering the parameters of an HMM.
 +
 
 +
== Inference ==
 +
Several interesting inference task can be considered for HMMs.
 +
=== Evaluating the probability of an observed sequence ===
 +
Given an observed sequence and an HMM model, we can evaluate the probability that the sequence was generated by the model. We can evaluate this, for example, for several different models, and select the one with the highest probability as the recognition model. This probability can be calculated using dynamic programming with the [http://en.wikipedia.org/wiki/Forward_algorithm forward algorithm].
 +
=== Recovering the most probable state sequence ===
 +
Given an observed sequence, we can find the sequence of states that maximized the joint probability of the observed sequence and the state sequence for that model. This is done with the [http://en.wikipedia.org/wiki/Viterbi_algorithm Viterbi] algorithm.
  
 
== Relevant Papers ==
 
== Relevant Papers ==
Line 13: Line 39:
 
| ?UsesDataset
 
| ?UsesDataset
 
}}
 
}}
 +
 +
[[Category::method]]

Latest revision as of 15:20, 4 October 2011

Hidden Markov Models (HMMs) are statistical models that are used for representing stochastic Markov processes with hidden states.

The output of the model is observable and dependent on the hidden states. The process is governed by transition probabilities which determine the probability of moving between states, and emission probabilities which determine the output of each state. Therefore, the observed output sequence can provide information as to a probable sequence of hidden states that were used in the generation process.

Description

Let be a random variable that represents the state at time , and let be a random variable that represents the emission at time . Define as the start state and as the stop state.

If the HMM represents a first-order Markov process then

Alternatively, the HMM could represent a Markov process of order , where


A (first-order) HMM defines a distribution over a sequence of states, , and output tokens, .

This distribution can be redefined using transition and emission probabilities. Let be a transition probability, the probability of moving from state to state in the model

And let be an emission probability, the probability that state emits the token

Then

Training

The training task is defined as finding the emission and transition probabilities of an HMM given some sample set of observations. This is usually done using maximum likelihood estimation algorithms. The Baum-Welch algorithm is a generalized Expectation Maximization algorithm that uses Forward-Backward for recovering the parameters of an HMM.

Inference

Several interesting inference task can be considered for HMMs.

Evaluating the probability of an observed sequence

Given an observed sequence and an HMM model, we can evaluate the probability that the sequence was generated by the model. We can evaluate this, for example, for several different models, and select the one with the highest probability as the recognition model. This probability can be calculated using dynamic programming with the forward algorithm.

Recovering the most probable state sequence

Given an observed sequence, we can find the sequence of states that maximized the joint probability of the observed sequence and the state sequence for that model. This is done with the Viterbi algorithm.

Relevant Papers

method