Siddharth writeup of Wang & Cohen

From Cohen Courses
Jump to navigationJump to search

This is a review of Wang_2009_automatic_set_instance_extraction_using_the_web by user:sgopal1.

  • This paper proposes a way to extract entities which belong to a given semantic class. The method is named AISE - Automatic set instance extractor. The proposed method consists of three components - Noisy instance generator, reranker , boot-strapper.
  • The noisy instance generator uses a set of hyponym patterns to extract the possible candidate instances from the results of search engines. They use some heuristics to define the boundary length etc. The set expander is used to expand the extracted entities in the previous step by generating entity wrappers.
  • There are a couple of changes which have been introduced such as a) retrieving web pages that are relevant to every pair of seed instances b) wrapper contexts should bracket a minimum of instances of atleast two seeds c) usage of hint words to improve extraction. The bootstrapping phase uses the iterative seal to expand lists using multiple iterations. The results shows some improvements over the other methods.
  • Question : Could it be possible that the improvement in performance was because of a single page retrieved from the search engine which happened to contain a lot more entities ? It would be nice to see some curve indicating how many entities a page contains ( 100 pages contain 2 entities each , 4 pages contain 30 entities each .. etc )