Selen writeup of Wang Cohen 2008
This is a review of Wang_2008_iterative_set_expansion_of_named_entities_using_the_web by user:Selen.
This paper introduces iSEAL, the improved version of SEAL, their set expansion system. Although SEAL works fine with given three seeds and it is markup/human language independents, it performs poorly when given more seeds. The problem with the number of seed in SEAL was that the fetcher module was retrieving web pages containing all the seeds. This system is developed to overcome this difficuty. The basic idea is simple: they divide the seeds and make multiple calls to SEAL. The process has two different approaches: supervised extraction which calls SEAL on a few seed and combines the statsitics. In supervised extraction they have two seed feeding strategy: fixed seed size(seed size = 2) and iss (increasing seeds one by one). The results after each iteration is expanded and ranker runs on the accumulated stats.
Bootstrapping selects two seed from the seed list and adds the top seeds back to the seed lists and expands the newly generated list. BUt it faces a possible problem of accumulating a noisy set of seeds which they also report saying that it is sensitive to the ranking method and the number of seeds.
In this paper they just get the longest wrapper. They show that bootstrapping the results using iss and using random walk as ranker they improve on SEAL by one percent. My criticism about this paper, they still didn't say how they would do if the seeds were not a small set, and instead of splitting the seeds and calling SEAL and aggregating the results they could have come up with another approach, %1 improvement does not sound very impressive.