Class Meeting for 10-707 10/21/2009
From Cohen Courses
This is one of the class meetings on the schedule for the course Information Extraction 10-707 in Fall 2009.
IE and Reasoning
Required Readings
- WHIRL: a word-based information representation language. By William W Cohen, {{{coauthors}}}. In Artif. Intell., vol. 118 (1-2), 2000.
- Hardening soft information sources, by William W Cohen, Henry Kautz, David McAllester. In KDD '00: Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, 2000.
- A comparison of string distance metrics for name-matching tasks, by W. W Cohen, P. Ravikumar, S. E Fienberg. In Proceedings of the IJCAI-2003 Workshop on Information Integration on the Web (IIWeb-03), 2003.
This is a lot of material, I know, and it's awkward to critique the instructors work. You don't need to read the journal paper in all its glorious detail - I'll cover it in class - but I do recommend looking it over first. If you prefer you can just write down one or two questions about each paper.
Optional Readings
- The role of named entities in Web People Search, by J. Artiles, S. Madrid, E. Amigó, J. Gonzalo. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, 2009.
- A latent dirichlet model for unsupervised entity resolution, by I. Bhattacharya, L. Getoor. In SIAM International Conference on Data Mining, 2006.
- Text joins in an RDBMS for web data integration, by Luis Gravano, Panagiotis G. Ipeirotis, Nick Koudas, Divesh Srivastava. In Proceedings of the 12th international conference on World Wide Web, 2003.
- Robust reading: Identification and tracing of ambiguous names, by X. Li, P. Morie, D. Roth. In Proc. of NAACL, 2004.
- Robust Similarity Measures for Named Entities Matching, by E. Moreau, F. Yvon, O. Cappe. In Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008), 2008.