Yandongl writeup of Wang 2008

From Cohen Courses
Jump to navigationJump to search

This is a review of Wang_2008_iterative_set_expansion_of_named_entities_using_the_web by user:Yandongl.

This paper is an extension to Wang 2007, which introduces SEAL system. The motivation of this paper is that when there are too many (>5) user-provided seed instances, the accuracy of the set expansion will drop. To overcome this limitation, iSEAL is proposed, which is able to leverage multiple seeds. While why users would provided many seed instances remains unknown to me, this feature does have a nice functionality which is for bootstrapping, meaning it consumes its own outputs to improve accuracy.

The iterative expansion process is as follows: Fixed Seed Size (FSS) chooses top 2 seed during each iteration while Increasing Seed Size (ISS) expand the set of seeds by inducing one more seed each time. Those two processes make bootstrapping possible in the system. Ranking is the same as in Wang 2007 which is Random Walk on graph with restart. Other comparison methods were introduced such as PageRank, Bayesian Sets as well as Wrapper Length. Experiments indicate that FSS converges faster and can reach a higher accuracy. Random Walk is quite robust to noise data and beats all other ranking methods.

Still why users would provide more than 5 seeds is not clear to me. An example would very helpful.