Yandongl writeup of Wang 2009

From Cohen Courses
Jump to navigationJump to search

This is a review of Wang_2009_automatic_set_instance_extraction_using_the_web by user:Yandongl.

This paper again extends two previous papers (Wang 2007, Wang 2008) by adding a new functionality, which is to extract a list of instances only by a given semantic name. This is really an enjoyable feature since I believe it's typical for lots of users to only input a class name rather than giving any instance. To implement this, a couple of new components were introduced such as Noise Instance Generator, Set Expander, Reranker and Bootstrapper. Among those components, the most important ones in my opinions are NIG that use hyponym to obtain initial seed instances; Fetcher that sending a two-seed query for each pair of seeds; Extractor extracts wrapper for a minimum of two seeds; and sing hint words when querying search engine; Reranker that ranks the list twice for maximum accuracy.

Experiments show how each component(NIG,RR,BS) boosts the accuracy of the whole system, and that this new system beats other comparison ones such as Kozareva and Pasca. In addition, ASIE still has the language-independence feature, and it works for three languages (English, Chinese, and Japanese).

ASIE uses Yahoo rather than Google, which has been used in previous systems, for search engine service. This looks quite strange to me.