Difference between revisions of "Wikipedia Infobox Generator Using Cross Lingual Unstructured Text"
From Cohen Courses
Jump to navigationJump to searchLine 10: | Line 10: | ||
=== Corpus === | === Corpus === | ||
− | Wikipedia XML Dumps (Current Revision only) | + | * Wikipedia XML Dumps (Current Revision only) |
− | + | * http://en.wikipedia.org/wiki/Wikipedia_database#Other_languages | |
* English corpus size - 31 GB Uncompressed | * English corpus size - 31 GB Uncompressed | ||
* With 5 languages, approximately 200 GB total | * With 5 languages, approximately 200 GB total |
Revision as of 12:06, 8 September 2011
Contents
Wikipedia Infobox Generator Using Cross Lingual Unstructured Text
Team Members
- Anirudh Koul
- Daegun Won
- Tony Navas
Project Idea
Corpus
* Wikipedia XML Dumps (Current Revision only) * http://en.wikipedia.org/wiki/Wikipedia_database#Other_languages * English corpus size - 31 GB Uncompressed * With 5 languages, approximately 200 GB total
Reference Papers
(2007) Wu, Weld. Proceedings of the sixteenth ACM conference on Conference on information and knowledge management CIKM 07