Liuy writeup of Wang Cohen 2009

From Cohen Courses
Revision as of 10:42, 3 September 2010 by WikiAdmin (talk | contribs) (1 revision)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

This is a review of Wang_2009_automatic_set_instance_extraction_using_the_web by user:Liuy.


Summary

This paper introduces Automatic Set Instance Extractor based on set expansion from web. Taking semantic class name as input, the system generates semantic lexicons from a corpus of considerable size. It searches all instances of a set given several seeds. Initial seeds are found in a language-dependent fashion; and then ranked by language-independent set expansion. The paper shows comparable results to the method of Kozareva et alon English-language benchmarks as well as on Chinese and Japanese. Even under the situation that Kozareva et al system has adddtional information on assuming the user provides a class name and a single initial seed.

Specifically, after the language-independent Noisy Instance Generation (try not using any parts-of-speech tagger or parser or capitalization), Set Expander SEAL takes as input element seeds and search for probable instances from web. Then the ranker drives away irrelevant candidates through assigning relevant candidates higher ranks in the list, and adding to the list more relevant candidates. To further improve the ranking list, Bootstrapper iteratively expands candidates ranked top, combine statistics from different iteration, and grow the graph.

Commentary

1. I like the ASIE importantly because it is so fast : costing several minutes for one problem, and costing several queries to an search engine.

2. It is great to see its ability in diverse languages, which indicated its possible usage in machine translation. It might be interesting to see whether multiple languages instances can be added and extracted at the same time.

3. For the graph constructed by the Reranker, it does Random Walk with Restart until it converges. I am not sure how we define the converge of all node weights (possibly means they do not change any more)