Difference between revisions of "Wikipedia Infobox Generator Using Cross Lingual Unstructured Text"

From Cohen Courses
Jump to navigationJump to search
Line 10: Line 10:
  
 
=== Corpus ===
 
=== Corpus ===
* Wikipedia XML Dumps (Current Revision only)
+
* Wikipedia XML Dumps (Current Revision only)
* http://en.wikipedia.org/wiki/Wikipedia_database#Other_languages
+
* http://en.wikipedia.org/wiki/Wikipedia_database#Other_languages
* English corpus size - 31 GB Uncompressed
+
* English corpus size - 31 GB Uncompressed
* With 5 languages, approximately 200 GB total  
+
* With 5 languages, approximately 200 GB total  
  
 
=== Reference Papers ===
 
=== Reference Papers ===
 
(2007) Wu, Weld. Proceedings of the sixteenth ACM conference on Conference on information and knowledge management CIKM 07
 
(2007) Wu, Weld. Proceedings of the sixteenth ACM conference on Conference on information and knowledge management CIKM 07

Revision as of 12:09, 8 September 2011

Wikipedia Infobox Generator Using Cross Lingual Unstructured Text

Team Members

  • Anirudh Koul
  • Daegun Won
  • Tony Navas

Project Idea

Corpus

Reference Papers

(2007) Wu, Weld. Proceedings of the sixteenth ACM conference on Conference on information and knowledge management CIKM 07