Difference between revisions of "Bbd writeup of comparison of string distance metrics"
From Cohen Courses
Jump to navigationJump to searchm (1 revision) |
|
(No difference)
|
Latest revision as of 11:42, 3 September 2010
This is a review of Cohen_2003_a_comparison_of_string_distance_metrics_for_name_matching_tasks by user:Bbd.
This paper presents experimental comparison of various string distance metrics. They are evaluated on matching and clustering tasks. Following methods were considered :
- Edit-distance like functions - Levenstein and Monge-Elkan
- Token based distance function - Jaccard similarity, Jenson-shannon, SFS distance
- Hybrid distance functions - soft TFIDF
During the experimental study they found that
- TFDF performs best among several token-based metrics
- Monge-elkan is best among string edit-distance metrices
and combination of TFIDF and jaro-Winkler performs better than either of them.