KeisukeKamataki writeup of Wang 2008

From Cohen Courses
Jump to navigationJump to search

This is a review of Wang_2008_iterative_set_expansion_of_named_entities_using_the_web by user:KeisukeKamataki.

Summary: The main difference between this paper and the previous one is that this paper extends SEAL in order to achieve still good MAP performance for the cases of many given seeds. They tried Iterative versions of SEAL and combined it with different ranking methods to achieve this goal and measure the effectiveness of each method. They also compared the performance of the methods if they were supervised or trained based on the bootstrapping.

For iterative process, they tried "Fixed Seed Size(FSS)" and "Increasing Seed Size(ISS)". FSS uses only two seeds of random select for each iteration of the run and expand the sets. ISS uses the fixed numbe of seeds (they chose 4) combining the supervised seeds and some randomply chosen seeds. It also utilizes old seeds of the previous iteration.

As for ranking methods, they tried Random Walk with Restart, PageRank, Bayesian Sets, Wrapper Length and Wrapper Frequency. Random Walk with Restart usually worked best and Bayesian Set also worked well.

When they were supervised, the performance almost always monotonically increased according to the number of given seeds regardless of the iterative process and ranking method. As for bootstrapping, ISS more steadily improved the peroformance according to the number of seeds than FSS.

I like: This paper is compact, but very informative. It could be all the better if there is some mention about the computational time.