Difference between revisions of "Philgoo Han writeup of Cohen, Ravikumar and Fienberg"

From Cohen Courses
Jump to navigationJump to search
m (1 revision)
 
(No difference)

Latest revision as of 11:42, 3 September 2010

This is a review of Cohen_2003_a_comparison_of_string_distance_metrics_for_name_matching_tasks by user:Ironfoot.

  • Comparison of string distance metrics
    • Open source java toolkit for name-matching
    • Little proir knowledge, ill-structured data
  • Edit-distance like functions
  • Token based distance functions
  • Hybrid distance functions
  • Blocking methods: Not practical to match all pair
  • Results
    • Matching: SoftTFIDF is generally the best
    • Clustering: Token based is good in average but bad when there are many misspellings
    • Combination of distance metrics: better result but training overhead