KeisukeKamataki writeup of Wang 2007
This is a review of Wang_2007_language_independent_set_expansion_of_named_entities_using_the_web by user:KeisukeKamataki.
Summary: This paper introduces SEAL, a set expansion system. The key components are extracor which automatically constructs wrapper for each single web document and Ranker which gives a rank of instaces of a set according to the similarity to the seed entities. Wrapper consists of the information of the left-context and right-context focusing on the longest matching of the seed word on the web document. The idea is that a web document tends to be well structured with in a single document. Since this method is based on character features,it works in language-independent way. Ranker gives ranking of extracted items of the set based on a cyclic directed graph like random-walk and random-jump. The similarity is measured utilizing possible binary relations between two items. The method overwhelmingly outperformed Google Sets in temrs of MAP.
I like: The approach itself looks simple, but it could be very powerful because this paper well defines the problem and takes suitable approach. It is also a good point that the method of character based features could be applied to language-independent way.