Suranah writeup for Wang and Cohen 2007
This is a review of Wang_2007_language_independent_set_expansion_of_named_entities_using_the_web by user:Suranah.
The paper discusses the general, language independent set expansion system SEAL. The system has three modules in its pipeline- the fetcher, extractor and ranker. The seed patterns are fetched from Google, patterns are extracted from them, and then ranked using a method similar to PageRank with decay.
Among the results, I found it most interesting that the approach worked favorably both for something as subjective as the classic Disney movies (I presume that there is no official list), and other very objective categories like NFL teams.
It maybe interesting to explore how similar techniques can be used for transliterating names (an important problem in cross-script translation), and even used to estimate a better phonetic model for languages like Arabic and Persian.