Difference between revisions of "Wikipedia Infobox Generator Using Cross Lingual Unstructured Text"

From Cohen Courses
Jump to navigationJump to search
Line 7: Line 7:
  
 
=== Project Idea ===
 
=== Project Idea ===
Key facts about Wikipedia articles are often mentioned in short block sections called 'InfoBoxes'. For example, an article on Brazil would mention "Capital:Brasília", "Largest city:São Paulo", "Official language : Portuguese", etc. As these infoboxes are manually created & maintained, several articles have either missing or outdated information (that got revised in the plain text.  
+
Key facts about Wikipedia articles are often mentioned in short block sections called 'InfoBoxes'. For example, an article on Brazil would mention "Capital:Brasília", "Largest city:São Paulo", "Official language : Portuguese", etc. As these infoboxes are manually created & maintained, several articles have either missing or outdated information (that got revised in the plain text. We also note that articles in languages closer to the native speakers have better information. For eg, articles on latin Soccer players have much better facts like "Debut match, number of goals, Records wins" mentioned in Español version of Wikipedia whereas the English version lacks these crucial stats.
  
* g
+
In our SPLODD project, we propose to achieve two objectives:
* b
+
 
 +
* Extract facts for unstructured wikipedia text to generate infobox
 +
* Combine facts in multiple languages for an article to generate infoboxes with extensive information.
  
 
=== Corpus ===
 
=== Corpus ===

Revision as of 12:38, 8 September 2011

Wikipedia Infobox Generator By Combining Multi Lingual Unstructured Text

Team Members

Project Idea

Key facts about Wikipedia articles are often mentioned in short block sections called 'InfoBoxes'. For example, an article on Brazil would mention "Capital:Brasília", "Largest city:São Paulo", "Official language : Portuguese", etc. As these infoboxes are manually created & maintained, several articles have either missing or outdated information (that got revised in the plain text. We also note that articles in languages closer to the native speakers have better information. For eg, articles on latin Soccer players have much better facts like "Debut match, number of goals, Records wins" mentioned in Español version of Wikipedia whereas the English version lacks these crucial stats.

In our SPLODD project, we propose to achieve two objectives:

  • Extract facts for unstructured wikipedia text to generate infobox
  • Combine facts in multiple languages for an article to generate infoboxes with extensive information.

Corpus

Reference Papers

  • Wu, Weld. Proceedings of the sixteenth ACM conference on Conference on information and knowledge management CIKM 2007
  • Eytan Adar , Michael Skinner , Daniel S. Weld, Information arbitrage across multi-lingual Wikipedia, Proceedings of the Second ACM International Conference on Web Search and Data Mining, February 09-12, 2009, Barcelona, Spain