Wikipedia Infobox Generator Using Cross Lingual Unstructured Text

From Cohen Courses
Revision as of 11:06, 8 September 2011 by Akoul (talk | contribs)
Jump to navigationJump to search

Wikipedia Infobox Generator Using Cross Lingual Unstructured Text

Team Members

  • Anirudh Koul
  • Daegun Won
  • Tony Navas

Project Idea

Corpus

* Wikipedia XML Dumps (Current Revision only)
  * http://en.wikipedia.org/wiki/Wikipedia_database#Other_languages
* English corpus size - 31 GB Uncompressed
* With 5 languages, approximately 200 GB total 

Reference Papers

(2007) Wu, Weld. Proceedings of the sixteenth ACM conference on Conference on information and knowledge management CIKM 07