Difference between revisions of "Bbd writeup of comparison of string distance metrics"

From Cohen Courses
Jump to navigationJump to search
 
m (1 revision)
 
(No difference)

Latest revision as of 11:42, 3 September 2010

This is a review of Cohen_2003_a_comparison_of_string_distance_metrics_for_name_matching_tasks by user:Bbd.

This paper presents experimental comparison of various string distance metrics. They are evaluated on matching and clustering tasks. Following methods were considered :

  • Edit-distance like functions - Levenstein and Monge-Elkan
  • Token based distance function - Jaccard similarity, Jenson-shannon, SFS distance
  • Hybrid distance functions - soft TFIDF

During the experimental study they found that

  • TFDF performs best among several token-based metrics
  • Monge-elkan is best among string edit-distance metrices

and combination of TFIDF and jaro-Winkler performs better than either of them.