Selen writeup of Cohen et al. IJCAI '03
From Cohen Courses
Jump to navigationJump to searchThis is a review of Cohen_2003_a_comparison_of_string_distance_metrics_for_name_matching_tasks by user:Selen.
In this paper, they compare the performance of string distance metrics applied to named entity recognition. Methods they compare are:
edit distance metrics : Levenstein and Jaro Winkler
Token based distance functions: TFIDF, Jensen-Shannon
Hybrid functions: Monge-Elkan (recursive matching scheme )
and pruning methods
They found out that TFIDF works the best and even though Monge Elkan performs well, Jaro-Winkler work as well and faster. I wonder whether the comparison of these metrics are not biased towards a metric, the results would hold under any condition dataset, classification technique and so on.