Bbd writeup of comparison of string distance metrics
From Cohen Courses
Jump to navigationJump to searchThis is a review of Cohen_2003_a_comparison_of_string_distance_metrics_for_name_matching_tasks by user:Bbd.
This paper presents experimental comparison of various string distance metrics. They are evaluated on matching and clustering tasks. Following methods were considered :
- Edit-distance like functions - Levenstein and Monge-Elkan
- Token based distance function - Jaccard similarity, Jenson-shannon, SFS distance
- Hybrid distance functions - soft TFIDF
During the experimental study they found that
- TFDF performs best among several token-based metrics
- Monge-elkan is best among string edit-distance metrices
and combination of TFIDF and jaro-Winkler performs better than either of them.