Difference between revisions of "Wu et al KDD 2008"

Revision as of 00:52, 28 September 2011

Citation

Wu, F., Hoffmann, R. and Weld, D. 2008. Information Extraction from Wikipedia: Moving Down the Long Tail. In Proceedings of the 14th International Conference on Knowledge Discovery and Data Mining, pp. 731–739, ACM, New York.

Online version

University of Washington

Summary

This paper introduces three techniques for increasing recall of information extraction from Wikipedia's classes with a small number of articles (the long tail of sparse classes): shrinkage over a refined ontology, retraining using open information extractors and supplementing results by extracting from the general Web. These techniques are used to improve the performance of a previously developed information extractor called Kylin.

When training the extractor of a sparse class, the first technique (shrinkage) works by aggregating data from its parent and children classes. The subsumption herarchy needed for this task comes from a previously developed system called KOG (Kylin Ontology Generator).

Experimental results

...

Related papers

This paper improves Kylin, a self-supervised information extractor first described in Wu and Weld CIKM 2007. The shrinkage technique uses a cleanly-structured ontology, the output of KOG, an autonomous system for ontology refinement presented in Wu and Weld WWW 2008. The retraining technique uses TextRunner, an open information extractor described in Banko et al IJCAI 2007.

Revision as of 00:50, 28 September 2011 (view source) Aanavas (talk \| contribs) (→‎Summary) ← Older edit		Revision as of 00:52, 28 September 2011 (view source) Aanavas (talk \| contribs) (→‎Related papers) Newer edit →
Line 20:		Line 20:
	== Related papers ==		== Related papers ==

−	This paper improves a self-supervised information extractor first described in [[RelatedPaper::Wu and Weld CIKM 2007]]. The shrinkage technique uses a ~~refined~~ ontology, the output of ~~Kylin Ontology Generator~~, an autonomous system presented in [[RelatedPaper::Wu and Weld WWW 2008]]. The retraining technique uses TextRunner, an open information extractor described in [[RelatedPaper::Banko et al IJCAI 2007]].	+	This paper improves Kylin, a self-supervised information extractor first described in [[RelatedPaper::Wu and Weld CIKM 2007]]. The shrinkage technique uses a cleanly-structured ontology, the output of KOG, an autonomous system for ontology refinement presented in [[RelatedPaper::Wu and Weld WWW 2008]]. The retraining technique uses TextRunner, an open information extractor described in [[RelatedPaper::Banko et al IJCAI 2007]].

Difference between revisions of "Wu et al KDD 2008"

Revision as of 00:52, 28 September 2011

Contents

Citation

Online version

Summary

Experimental results

Related papers

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools