Difference between revisions of "Wikipedia Infobox Generator Using Cross Lingual Unstructured Text"
From Cohen Courses
Jump to navigationJump to search| Line 2: | Line 2: | ||
=== Team Members === | === Team Members === | ||
| − | * Anirudh Koul | + | * [[Anirudh Koul]] |
| − | * Daegun Won | + | * [[Daegun Won]] |
| − | * Tony Navas | + | * [[Tony Navas]] |
=== Project Idea === | === Project Idea === | ||
Revision as of 11:15, 8 September 2011
Contents
Wikipedia Infobox Generator By Combining Multi Lingual Unstructured Text
Team Members
Project Idea
Corpus
- Wikipedia XML Dumps (Current Revision only)
- http://en.wikipedia.org/wiki/Wikipedia_database#Other_languages
- English corpus size - 31 GB Uncompressed
- With 5 languages, approximately 200 GB total
Reference Papers
- Wu, Weld. Proceedings of the sixteenth ACM conference on Conference on information and knowledge management CIKM 2007
- Eytan Adar , Michael Skinner , Daniel S. Weld, Information arbitrage across multi-lingual Wikipedia, Proceedings of the Second ACM International Conference on Web Search and Data Mining, February 09-12, 2009, Barcelona, Spain