Selen writeup of Cohen et al. IJCAI '03

From Cohen Courses
Revision as of 11:42, 3 September 2010 by WikiAdmin (talk | contribs) (1 revision)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

This is a review of Cohen_2003_a_comparison_of_string_distance_metrics_for_name_matching_tasks by user:Selen.


In this paper, they compare the performance of string distance metrics applied to named entity recognition. Methods they compare are: edit distance metrics : Levenstein and Jaro Winkler Token based distance functions: TFIDF, Jensen-Shannon Hybrid functions: Monge-Elkan (recursive matching scheme ) and pruning methods

They found out that TFIDF works the best and even though Monge Elkan performs well, Jaro-Winkler work as well and faster. I wonder whether the comparison of these metrics are not biased towards a metric, the results would hold under any condition dataset, classification technique and so on.