Rbalasub writeup of Wang and Cohen - 2007
From Cohen Courses
Jump to navigationJump to searchA review of Wang_2007_language_independent_set_expansion_of_named_entities_using_the_web by user:rbalasub
SEAL is an algorithm to build sets of entities using a small number of seeds. The algorithm consists of three steps
- Fetcher - pages that might contain entities are obtained by doing searches on Google using the concatenated seed set as the query
- Wrapper induction - based on the longest prefix and suffix surrounding the seeds
- Ranking - to remove noise by finding likely entities using random graph walks
The algorithm is evaluated against Google sets, Bayesian sets and KnowItAll and outperforms it significantly.