Nlao writeup of Cohen 2000 KDD
This is a review of Cohen_2000_hardening_soft_information_sources by user:Nlao.
This work represents the theme of a wide range of information integration problems: a global data model (in this case the "interpretation I") is to be derived based on evidences (in this case pair wide distance). The global model is obtained by solving an combinatory optimization problem, which is most likely NP hard but with approximated (greedy) yet effective solutions available.
For this task, the ground truth of global model is unambiguously defined. This might not be true for other tasks like schema extraction (Cafarella and et al. 2007) and Taxonomy Induction (Yang & Callan 2009). In these cases, extrinsic tasks or human judgment is need for evaluation.
- Minor points
- Better to have some form of evaluation.
- Reference
Michael J. Cafarella, Christopher Re, Dan Suciu, Oren Etzioni: Structured Querying of Web Text Data: A Technical Challenge. CIDR 2007: 225-234
Hui Yang and Jamie Callan. "A Metric-based Framework for Automatic Taxonomy Induction". In Proceedings of the 47th Annual Meeting of the Association for Computational Linguistics (ACL2009), Singapore. Aug 2-7, 2009.