Difference between revisions of "Wu and Weld WWW 2008"

From Cohen Courses
Jump to navigationJump to search
Line 14: Line 14:
 
The autonomous system, presented as Kylin Ontology Generator (KOG), is comprised of three modules:
 
The autonomous system, presented as Kylin Ontology Generator (KOG), is comprised of three modules:
 
* a schema cleaner, which merges duplicate classes and attributes and prunes rarely-used ones;
 
* a schema cleaner, which merges duplicate classes and attributes and prunes rarely-used ones;
* a subsumption detector, which identifies [http://en.wikipedia.org/wiki/Is-a IS-A] relations between infobox classes (e.g. "volleyball player" IS-A "athlete");
+
* a subsumption detector, which identifies '''[http://en.wikipedia.org/wiki/is-a is-a]''' relations between infobox classes (e.g. "volleyball player" is-a "athlete");
 
* and a schema mapper, which builds attribute mappings between related infobox classes.
 
* and a schema mapper, which builds attribute mappings between related infobox classes.
  
The detection of subsumption relations is modeled as a binary classification problem and several features are used, such as a, b, and c.
+
The subsumption detection task is modeled as a binary classification problem and several intuitive indicators are used as features to train the classifiers:
 +
* similarity measure: the similarity between two infobox classes, measured using the TF/IDF scores between bags of words taken from their attribute set, the first sentence of each of their instances and their category tags.
  
 
== Experimental result ==
 
== Experimental result ==

Revision as of 00:22, 26 September 2011

Citation

Wu, F. and Weld, D. 2008. Automatically Refining the Wikipedia Infobox Ontology. In Proceedings of the 17th Conference of the World Wide Web, pp. 635-644, ACM, New York.

Online version

University of Washington

Summary

This is a paper that introduces an autonomous system for refining Wikipedia’s infobox information schema to create a cleanly-structured ontology. Advanced query capability, improved information extractors and semiautomatic generation of new infobox templates are shown as advantages of a refined ontology. The ontology refinement problem is solved using both Support Vector Machines and a more powerful joint-inference approach expressed in Markov Logic Networks.

The autonomous system, presented as Kylin Ontology Generator (KOG), is comprised of three modules:

  • a schema cleaner, which merges duplicate classes and attributes and prunes rarely-used ones;
  • a subsumption detector, which identifies is-a relations between infobox classes (e.g. "volleyball player" is-a "athlete");
  • and a schema mapper, which builds attribute mappings between related infobox classes.

The subsumption detection task is modeled as a binary classification problem and several intuitive indicators are used as features to train the classifiers:

  • similarity measure: the similarity between two infobox classes, measured using the TF/IDF scores between bags of words taken from their attribute set, the first sentence of each of their instances and their category tags.

Experimental result

...

Related papers

This paper is based on Wu and Weld CIKM 2007.