Ristad and Yianilos 1997 Learning String Edit Distance
Contents
Citation
Ristad, E.S. and Yianilos, P.N. Learning string-edit distance. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20: 522--532, 1998.
Online
- Paper web-site by Yianilos.
Summary
In this paper the authors describe a stochastic HMM model for learning a String Edit Distance function, as well as an efficient EM variant for learning the edit costs.
The String Edit Distance model is then used to provide a stochastic solution to the problem of String Classification, in which the classification is based on the similarity of an observed string, , to an underlying prototype, , from a class, . The paper describes an efficient algorithm for inducing a joint probability model from a corpus, and this model is used to classify new strings.
Finally, the authors address the problem of learning Word Pronunciation in conversational speech by reducing it to a classification problem and applying the String Edit Distance model. In this problem, the classes are the set of syntactic words in a language, and a similarity is measured between the set of phonological segments of words (these are the prototype strings) and observed phonemes.
A Stochastic Model for String Edit Distance
The distance between the strings and is modeled as a memoryless stochastic transduction (HMM) of edit operations, including deletions, insertions, and substitutions (modeling letter-specific transitions for every alphabet letter). Using this model, two distance functions are defined: (1) The Viterbi edit distance is defined by most likely transduction between two strings, and (2) the stochastic edit distance is defined by aggregating all transductions between the strings. The paper describes variants of the Forward and Backward algorithms as well as an EM variant for learning the parameters of the HMM which represent the costs of the edit operations.
Three main types of HMM are being considered:
- Tied: The model is simplified to contain only four distinct edit costs for insertion, deletion, identity and substitution. Using these parameter equivalence classes is termed "parameter tying".
- Mixed: A -component mixed transducer is a linear combination of HMM's, each using a different model (e.g., each model may use a different degree of parameter tying).
- Memory: This is a model using a higher order Markov chain, that is, the probability of future events depends on past events for some .
Baseline
This modeled is compared with the Levenshtein distance which is the minimum number of edit operations required for the transduction between the two strings. Effectively, this is a transformation process in which the cost of identity substitutions is zero and all other edit costs are one.
Results
The experimental evaluations test the use of this approach with all of the suggested models (using Tied, Untied, and Mixed models, and setting the distance to either the Viterbi or the stochastic distance for each of the models). The error rates of the stochastic models are from one half to one sixth the error of using the baseline Levenshtein distance. Over 87 percent of the unseen pronunciations are correctly recognized, which is within 4 percent of the maximum success rate by any classifier.