Rbalasub writeup of Wang and Cohen - 2007

From Cohen Courses
Jump to navigationJump to search

A review of Wang_2007_language_independent_set_expansion_of_named_entities_using_the_web by user:rbalasub

SEAL is an algorithm to build sets of entities using a small number of seeds. The algorithm consists of three steps

  1. Fetcher - pages that might contain entities are obtained by doing searches on Google using the concatenated seed set as the query
  2. Wrapper induction - based on the longest prefix and suffix surrounding the seeds
  3. Ranking - to remove noise by finding likely entities using random graph walks

The algorithm is evaluated against Google sets, Bayesian sets and KnowItAll and outperforms it significantly.