Selen writeup of Wang Cohen 2009

From Cohen Courses
Revision as of 10:42, 3 September 2010 by WikiAdmin (talk | contribs) (1 revision)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

This is a review of Wang_2009_automatic_set_instance_extraction_using_the_web by user:Selen.


This paper introduces ASIE, automatic set instance extractor, which is a follow up of the author's previous system, SEAL but differs from it by the usage of language dependent phrases. To be more precise, given a set of seeds, Seal was extracting set of instances by bootstrapping two initial seeds, in ASIE, given a class name, it first extracts noisy set of seeds using a set of hyponym patterns and then use SEAL to expand and bootstrap the noisy set of seeds. The approach is still language independent as in the case of SEAL, except it uses language dependent phrases.

ASIE has three components: Noisy Instance Generator, Reranker and bootstrapper.

  • Noisy Instance Generator uses patterns originated from an earlier study and doesnot use any NLP tools to extract a noisy pool of candidate instances. It first retrieves 100 results from a query, applies the patterns and ranks the candidate instances
  • Reranker: BY using a noise resistant SEAL that queries all possible pairs, rather than concatenating the seeds, reranker pushes irrelevant candidates to the lower in the ranked list.
  • Bootstrapper: Using the list obtained from reranker it calls iSEAL from the earlier work of the authors to bootstrap.

They try their system on three languages using Yahoo as a search engine. They compare their work to Kozareva and state that they do comparably well by using only class name, without a seed, and they do it more efficiently. Also in this work, they can apply their tehcnique to more general classes, such as scientists, presidents, etc, a major disadvantage of the authors' previous system SEAL.


My critisms are: how can it be easily generalized to all languages and retrieving no results for some class name seems to be a problem.