Pair HMM

From Cohen Courses
Revision as of 13:53, 27 October 2011 by Dmovshov (talk | contribs)
Jump to navigationJump to search

Stub method page for pair HMMs, to be filled by Dana Movshovitz-Attias

Pair HMMs are generative models that extend traditional HMMs by having two observation sequences instead of one. The pair HMM emits edit operations between pairs of observations, and by doing so, generates an alignment of the two sequences through the hidden states.

Pair HMMs models have been used for estimating string similarity, based on the string edit distance between the two observation sequences. They are also widely used for biological sequence analysis.

Single-State Pair HMM

A simple pair HMM can be defined over a single state which emits edit operations between two observation sequences, , as well as an end symbol.

  • Let be a set of edit operations, including deletions , insertions and substitutions , and an end symbol .
  • Let be the emission probability over the edit operation
  • Let be a sequence of edit operations over and , and is the hidden property of the edit operation , representing the amount of symbols that have been consumed of each of the observation strings so far.

String Edit Distance based on a Pair HMM

Given a pair of input strings and a pair HMM, we can define two types of edit distances:

  • viterbi distance
  • stochastic distance

Multi-State Pair HMM

The single-state definition can be extended to multiple states (for example, distincs states may be defined for Deletions, Insertions and Substitutions). The dynamic programming algorithms for inference and decoding used in the single-state case, are easily extended to the multi-state case by iterating over a third dimension (representing the states).