Difference between revisions of "Pair HMM"
Line 1: | Line 1: | ||
− | |||
− | |||
Pair HMMs are generative models that extend traditional [[HMM | HMMs]] by having two observation sequences instead of one. The pair HMM emits edit operations between pairs of observations, and by doing so, generates an alignment of the two sequences through the hidden states. | Pair HMMs are generative models that extend traditional [[HMM | HMMs]] by having two observation sequences instead of one. The pair HMM emits edit operations between pairs of observations, and by doing so, generates an alignment of the two sequences through the hidden states. | ||
Line 44: | Line 42: | ||
== Multi-State Pair HMM == | == Multi-State Pair HMM == | ||
The single-state definition can be extended to multiple states (for example, distincs states may be defined for Deletions, Insertions and Substitutions). The dynamic programming algorithms for inference and decoding used in the single-state case, are easily extended to the multi-state case by iterating over a third dimension (representing the states). | The single-state definition can be extended to multiple states (for example, distincs states may be defined for Deletions, Insertions and Substitutions). The dynamic programming algorithms for inference and decoding used in the single-state case, are easily extended to the multi-state case by iterating over a third dimension (representing the states). | ||
+ | |||
+ | == Related Papers == | ||
+ | {{#ask: [[UsesMethod::Pair HMM]] | ||
+ | | ?AddressesProblem | ||
+ | | ?UsesDataset | ||
+ | }} |
Revision as of 08:37, 2 November 2011
Pair HMMs are generative models that extend traditional HMMs by having two observation sequences instead of one. The pair HMM emits edit operations between pairs of observations, and by doing so, generates an alignment of the two sequences through the hidden states.
Pair HMMs models have been used for estimating string similarity, based on the string edit distance between the two observation sequences. They are also widely used for biological sequence analysis.
Contents
Single-State Pair HMM
A simple pair HMM can be defined over a single state which emits edit operations between two observation sequences, , as well as an end symbol.
- Let be a set of edit operations, including deletions , insertions and substitutions , and an end symbol .
- Let be the emission probability over the edit operation
- Let be a sequence of edit operations over and , and is the hidden property of the edit operation , representing the amount of symbols that have been consumed of each of the observation strings so far.
String Edit Distance based on a Pair HMM
Given a pair of input strings and a pair HMM, we can define two types of edit distances:
- The Viterbi distance is defined as the most likely transduction between the two strings and it is given by
- The Stochastic distance is defined as the aggregation of all possible transductions between the two strings and is given by
Inference
Inference on pair HMMs is done with versions of the Forward and Backward algorithms which track the possible edit operations over positions on both observable strings.
And similarly:
Multi-State Pair HMM
The single-state definition can be extended to multiple states (for example, distincs states may be defined for Deletions, Insertions and Substitutions). The dynamic programming algorithms for inference and decoding used in the single-state case, are easily extended to the multi-state case by iterating over a third dimension (representing the states).