D. Lange et al., CIKM 2010

From Cohen Courses
Revision as of 21:23, 29 September 2011 by Wpang (talk | contribs) (→‎Summary)
Jump to navigationJump to search

Citation

Dustin Lange, Christoph Böhm, Felix Naumann. 2010. Extracting structured information from Wikipedia articles to populate infoboxes. In CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management.

Online version

ACM Digital Library

Summary

This is a paper introducing iPopulator system, which automatically populates infoboxes of Wikipedia articles by extracting attribute values from the article's text. (Also known as Infobox completion problem.

iPopulator's extraction workflow contains four steps:

  • Structure Analysis: For each attribute of the infobox template, we analyze its values given in the training articles' infoboxes to determine a structure that represents the attribute's syntactical characteristics.
  • Training Data Creation: For this step, we use articles that specify a value for an attribute as training data. Occurrences of attribute values within the training article texts are labeled.
  • Value Extractor Creation: The labeled training data are used to generate extractors for as many attributes as possible. We employ Conditional Random Fields (CRFs) to generate attribute value extractors. These extractors are automatically evaluated, so that ineffective extractors can be discarded.
  • Attribute Value Extraction: The extractors can then be applied to all articles to fnd missing attribute values for existing infoboxes.

The process of the steps mentioned above can be demostracted in the following figure:

IPopulatorExtractionProcess.png

Brief description of the method

Experimental Result

Related papers