Philgoo Han writeup of Cohen, Ravikumar and Fienberg
From Cohen Courses
Jump to navigationJump to searchThis is a review of Cohen_2003_a_comparison_of_string_distance_metrics_for_name_matching_tasks by user:Ironfoot.
- Comparison of string distance metrics
- Open source java toolkit for name-matching
- Little proir knowledge, ill-structured data
- Edit-distance like functions
- Token based distance functions
- Hybrid distance functions
- Blocking methods: Not practical to match all pair
- Results
- Matching: SoftTFIDF is generally the best
- Clustering: Token based is good in average but bad when there are many misspellings
- Combination of distance metrics: better result but training overhead