Difference between revisions of "Bilenko and Mooney 2003 Adaptive duplicate detection using learnable string similarity measures"

Revision as of 17:36, 29 September 2011

... Under construction by Dana Movshovitz-Attias

Citation

Bilenko, M. and Mooney, R.J., Adaptive duplicate detection using learnable string similarity measures.Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pp.39--48, 2003.

Online

PDF version.

Summary

This paper addresses the problem of Duplicate Document Detection by using string similarity measures. In contrast to previous methods that used generic or manually tuned distance metrics, in this paper, the authors suggest using learnable (trainable) text distance functions and training a specific function for different data fields. Such specialized function can capture a unique notion of similarity as is relevant for the specific data represented by a specific field.

Two similarity metrics are suggested:

The first extends the String Edit Distance as suggested by Ristad and Yianilos to include affine gaps.
The second metric measures similarity based on unordered bags of words, using an SVM for training.

Revision as of 13:22, 29 September 2011 (view source) Dmovshov (talk \| contribs) (→‎Summary) ← Older edit		Revision as of 17:36, 29 September 2011 (view source) Dmovshov (talk \| contribs) Newer edit →
Line 1:		Line 1:
		+	... Under construction by [[User:dmovshov \| Dana Movshovitz-Attias]]
	== Citation ==		== Citation ==

Difference between revisions of "Bilenko and Mooney 2003 Adaptive duplicate detection using learnable string similarity measures"

Revision as of 17:36, 29 September 2011

Citation

Online

Summary

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools