Wikipedia Infobox Generator Using Cross Lingual Unstructured Text
From Cohen Courses
Jump to navigationJump to search
Wikipedia Infobox Generator Using Cross Lingual Unstructured Text
=== Team Members === (Alphabetically)
- Anirudh Koul
- Daegun Won
- Tony Navas
Project Idea
Corpus
Wikipedia XML Dumps (Current Revision only)
* http://en.wikipedia.org/wiki/Wikipedia_database#Other_languages * English corpus size - 31 GB Uncompressed * With 5 languages, approximately 200 GB total
Reference Papers
(2007) Wu, Weld. Proceedings of the sixteenth ACM conference on Conference on information and knowledge management CIKM 07