Wikipedia Infobox Generator Using Cross Lingual Unstructured Text

From Cohen Courses
Revision as of 12:04, 8 September 2011 by Akoul (talk | contribs) (Created page with 'Wikipedia Infobox Generator Using Cross Lingual Unstructured Text === Team Members === (Alphabetically) * Anirudh Koul * Daegun Won * Tony Navas === Project Idea === === Cor…')
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

Wikipedia Infobox Generator Using Cross Lingual Unstructured Text

=== Team Members === (Alphabetically)

  • Anirudh Koul
  • Daegun Won
  • Tony Navas

Project Idea

Corpus

Wikipedia XML Dumps (Current Revision only)

* http://en.wikipedia.org/wiki/Wikipedia_database#Other_languages
* English corpus size - 31 GB Uncompressed
* With 5 languages, approximately 200 GB total 

Reference Papers

(2007) Wu, Weld. Proceedings of the sixteenth ACM conference on Conference on information and knowledge management CIKM 07