Ristad and Yianilos 1997 Learning String Edit Distance

From Cohen Courses
Revision as of 13:57, 22 September 2011 by Dmovshov (talk | contribs) (→‎Summary)
Jump to navigationJump to search

Citation

Ristad, E.S. and Yianilos, P.N. Learning string-edit distance. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20: 522--532, 1998.

Online

Summary

In this paper the authors describe a stochastic HMM model for learning a String Edit Distance function, as well as an efficient EM variant for learning edit costs. The authors then present a stochastic solution to the problem of String Classification, in which the classification is based on the similarity of an observed string, , to an underlying prototype, , from a class, . The paper describes an efficient algorithm for inducing a joint probability model from a corpus, and this model is used to classify new strings. Finally, the described techniques are applied to the problem of learning Word Pronunciation in conversational speech.

Stochastic Model for String Edit Distance

The distance between the strings and is modeled as a memoryless stochastic transduction of edit operations, including deletions, insertions, and substitutions. Using this model, two distance functions are defined: (1) The Viterbi edit distance is defined by most likely transduction between two strings, and (2) the stochastic edit distance...