Selen writeup of Cohen et al. IJCAI '03

From Cohen Courses
Jump to navigationJump to search

This is a review of Cohen_2003_a_comparison_of_string_distance_metrics_for_name_matching_tasks by user:Selen.


In this paper, they compare the performance of string distance metrics applied to named entity recognition. Methods they compare are: edit distance metrics : Levenstein and Jaro Winkler Token based distance functions: TFIDF, Jensen-Shannon Hybrid functions: Monge-Elkan (recursive matching scheme ) and pruning methods

They found out that TFIDF works the best and even though Monge Elkan performs well, Jaro-Winkler work as well and faster. I wonder whether the comparison of these metrics are not biased towards a metric, the results would hold under any condition dataset, classification technique and so on.