Nlao writeup of Cohen 2000 KDD

From Cohen Courses
Jump to navigationJump to search

This is a review of Cohen_2000_hardening_soft_information_sources by user:Nlao.

This work represents the theme of a wide range of information integration problems: a global data model (in this case the "interpretation I") is to be derived based on evidences (in this case pair wide distance). The global model is obtained by solving an combinatory optimization problem, which is most likely NP hard but with approximated (greedy) yet effective solutions available.

For this task, the ground truth of global model is unambiguously defined. This might not be true for other tasks like schema extraction (Cafarella and et al. 2007) and Taxonomy Induction (Yang & Callan 2009). In these cases, extrinsic tasks or human judgment is need for evaluation.

Minor points

- Better to have some form of evaluation.

Reference

Michael J. Cafarella, Christopher Re, Dan Suciu, Oren Etzioni: Structured Querying of Web Text Data: A Technical Challenge. CIDR 2007: 225-234

Hui Yang and Jamie Callan. "A Metric-based Framework for Automatic Taxonomy Induction". In Proceedings of the 47th Annual Meeting of the Association for Computational Linguistics (ACL2009), Singapore. Aug 2-7, 2009.