Selen writeup of Cohen et al. IJCAI '03

This is a review of Cohen_2003_a_comparison_of_string_distance_metrics_for_name_matching_tasks by user:Selen.

In this paper, they compare the performance of string distance metrics applied to named entity recognition. Methods they compare are: edit distance metrics : Levenstein and Jaro Winkler Token based distance functions: TFIDF, Jensen-Shannon Hybrid functions: Monge-Elkan (recursive matching scheme ) and pruning methods

They found out that TFIDF works the best and even though Monge Elkan performs well, Jaro-Winkler work as well and faster. I wonder whether the comparison of these metrics are not biased towards a metric, the results would hold under any condition dataset, classification technique and so on.

Selen writeup of Cohen et al. IJCAI '03

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools