Bbd writeup of Iterative Set Expansion using Web

From Cohen Courses
Revision as of 14:29, 16 November 2009 by Bbd (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

This is a review of Wang_2008_iterative_set_expansion_of_named_entities_using_the_web by user:Bbd

Original SEAL approach which used to extract only those webpages which contain all seeds. This approach is not good if there are many supervised seeds secified.

iSEAL deals with this problem by iterating over subsets of the set of supervised seeds and running SEAL for that subset. Thus each iteration will pick a subset of seeds and then extract entities using SEAL. Number of iterations are chosen by heuristic.

As we extract ore and more entities they can act as seeds in next iterations, using them in further iterations can be done in 2 fashons : supervised/unsupervised.

In Iteratve supervised expansion approach, a person selects which seed to pick from newly extracted entities. So in each coming iteration is picks say m seeds from original seed set and 1 newly extracted entity that the person picks.

In Bootstrapping approach, system itself picks the highly ranked newly extracted entities as seeds for further iterations. But this is very sensitive to noise since the erros can easily propagate from one iteration to other.

Paper also discusses various ranking methods that can be used to find most reliable set of entities.

I liked the boostrapping approach of set expansion since it can be used in any domain without any supervision, though we need to be careful abt noise in the seeds and data.