Difference between revisions of "Weld et al SIGMOD 2009"

Revision as of 16:19, 5 October 2010

Citation

Weld, D. S., Hoffmann, R., and Wu, F. 2009. Using Wikipedia to bootstrap open information extraction. SIGMOD Rec. 37, 4 (Mar. 2009), 62-68.

Online version

ACM Digital Library

Summary

This is a recent paper paper that addressed the Open Information Extraction problem. Authors used a self supervised learning prototype, KYLIN (Wu_and_Weld_CIKM_2007), trained using Wikipedia. There are three components in the proposed solution:

Self Learning
- The infobox of Wikipedia pages are used to determine the class of the page and attributes of the class.
- Training data for the extraction were constructed from these Wiki pages using heuristics. First a heuristic document classifier will classify documents into classes, then sentence classifier (MaxEnt with bagging bagging) determine if a sentence contains the relations. After that a CRF model will extract the values (second entities) of relations.
- Shrinkage was used to improve the recall with a automatic ontology generator which combine the infobox classes with WordNet. This ontology gives a hierarchy of classes and facilitate the training of a subclass with the data of super class.
Bootstrapping
- More training data were harvest from Web using TEXTRUNNER (Banko_et_al_IJCAI_2007).
- Web pages were weighted using the estimate of their relevance to the relation.
Correction
- An interface to encourage community to make correction, so more training data will be collected.

Related papers

More details of KYLIN can be found in Wu_and_Weld_CIKM_2007 in the task of completing infobox in Wikipedia pages. A follow up paper (Wu_and_Weld_ACL_2010) refines the solution by adding dependency parsing features to train the model.

@@ Line 21: / Line 21: @@
 #* An interface to encourage community to make correction, so more training data will be collected.
 == Related papers ==
-More details of KYLIN can be found in [[Wu_and_Weld_CIKM_2007]] in the task of refining Wikipedia. A follow up paper ([[Wu_and_Weld_ACL_2010]]) refine the solution by adding dependency parsing features to train model.
+More details of KYLIN can be found in [[Wu_and_Weld_CIKM_2007]] in the task of completing infobox in Wikipedia pages. A follow up paper ([[Wu_and_Weld_ACL_2010]]) refines the solution by adding dependency parsing features to train the model.

Difference between revisions of "Weld et al SIGMOD 2009"

Revision as of 16:19, 5 October 2010

Contents

Citation

Online version

Summary

Related papers

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools