Difference between revisions of "Wikipedia Infobox Generator Using Cross Lingual Unstructured Text"

From Cohen Courses
Jump to navigationJump to search
Line 11: Line 11:
 
=== Corpus ===
 
=== Corpus ===
 
  * Wikipedia XML Dumps (Current Revision only)
 
  * Wikipedia XML Dumps (Current Revision only)
  * http://en.wikipedia.org/wiki/Wikipedia_database#Other_languages
+
* http://en.wikipedia.org/wiki/Wikipedia_database#Other_languages
 
  * English corpus size - 31 GB Uncompressed
 
  * English corpus size - 31 GB Uncompressed
 
  * With 5 languages, approximately 200 GB total  
 
  * With 5 languages, approximately 200 GB total  

Revision as of 12:07, 8 September 2011

Wikipedia Infobox Generator Using Cross Lingual Unstructured Text

Team Members

  • Anirudh Koul
  • Daegun Won
  • Tony Navas

Project Idea

Corpus

* Wikipedia XML Dumps (Current Revision only)
* http://en.wikipedia.org/wiki/Wikipedia_database#Other_languages
* English corpus size - 31 GB Uncompressed
* With 5 languages, approximately 200 GB total 

Reference Papers

(2007) Wu, Weld. Proceedings of the sixteenth ACM conference on Conference on information and knowledge management CIKM 07