Difference between revisions of "Ristad and Yianilos 1997 Learning String Edit Distance"

From Cohen Courses
Jump to navigationJump to search
Line 13: Line 13:
 
== Summary ==
 
== Summary ==
  
In this [[Category::paper]] the authors describe a stochastic [[UsesMethod::HMM | HMM]] model for learning a [[AddressesProblem::String Edit Distance]] function, as well as an efficient [[UsesMethod::Expectation_Maximization | EM]] variant for learning edit costs. The authors then present a stochastic solution to [[AddressesProblem::String Classification]], in which the classification is based on the similarity of an observed string, <math>y^v</math>, to an underlying prototype, <math>x^t</math>, from a class, <math>w</math>. The paper describes an efficient algorithm for inducing a joint probability model <math>p(w, y^v|L, \phi)</math> from a corpus, and this model is used to classify new strings. Finally, the described techniques are applied to the problem of learning [[AddressesProblem::Word Pronunciation]] in conversational speech.
+
In this [[Category::paper]] the authors describe a stochastic [[UsesMethod::HMM | HMM]] model for learning a [[AddressesProblem::String Edit Distance]] function, as well as an efficient [[UsesMethod::Expectation_Maximization | EM]] variant for learning edit costs. The authors then present a stochastic solution to the problem of [[AddressesProblem::String Classification]], in which the classification is based on the similarity of an observed string, <math>y^v</math>, to an underlying prototype, <math>x^t</math>, from a class, <math>w</math>. The paper describes an efficient algorithm for inducing a joint probability model <math>p(w, y^v|L, \phi)</math> from a corpus, and this model is used to classify new strings. Finally, the described techniques are applied to the problem of learning [[AddressesProblem::Word Pronunciation]] in conversational speech.
 +
 
 +
=== Stochastic Model for String Edit Distance ===
 +
The distance between the strings <math>A^*</math> and <math>B^*</math> is modeled as a memoryless stochastic transduction of edit operations, including deletions, insertions, and substitutions. Using this model, two distance functions are defined: (1) The ''Viterbi edit distance'' is defined by most likely transduction between two strings, and (2) the ''stochastic edit distance''...

Revision as of 14:57, 22 September 2011

Citation

Ristad, E.S. and Yianilos, P.N. Learning string-edit distance. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20: 522--532, 1998.

Online

Summary

In this paper the authors describe a stochastic HMM model for learning a String Edit Distance function, as well as an efficient EM variant for learning edit costs. The authors then present a stochastic solution to the problem of String Classification, in which the classification is based on the similarity of an observed string, , to an underlying prototype, , from a class, . The paper describes an efficient algorithm for inducing a joint probability model from a corpus, and this model is used to classify new strings. Finally, the described techniques are applied to the problem of learning Word Pronunciation in conversational speech.

Stochastic Model for String Edit Distance

The distance between the strings and is modeled as a memoryless stochastic transduction of edit operations, including deletions, insertions, and substitutions. Using this model, two distance functions are defined: (1) The Viterbi edit distance is defined by most likely transduction between two strings, and (2) the stochastic edit distance...