Difference between revisions of "Wikipedia Infobox Generator Using Cross Lingual Unstructured Text"

From Cohen Courses
Jump to navigationJump to search
(Created page with 'Wikipedia Infobox Generator Using Cross Lingual Unstructured Text === Team Members === (Alphabetically) * Anirudh Koul * Daegun Won * Tony Navas === Project Idea === === Cor…')
 
Line 1: Line 1:
Wikipedia Infobox Generator Using Cross Lingual Unstructured Text
+
== Wikipedia Infobox Generator Using Cross Lingual Unstructured Text ==
  
=== Team Members === (Alphabetically)
+
=== Team Members ===  
 
* Anirudh Koul
 
* Anirudh Koul
 
* Daegun Won
 
* Daegun Won

Revision as of 12:06, 8 September 2011

Wikipedia Infobox Generator Using Cross Lingual Unstructured Text

Team Members

  • Anirudh Koul
  • Daegun Won
  • Tony Navas

Project Idea

Corpus

Wikipedia XML Dumps (Current Revision only)

* http://en.wikipedia.org/wiki/Wikipedia_database#Other_languages
* English corpus size - 31 GB Uncompressed
* With 5 languages, approximately 200 GB total 

Reference Papers

(2007) Wu, Weld. Proceedings of the sixteenth ACM conference on Conference on information and knowledge management CIKM 07