Apappu writeup on Wang ICDM '07

From Cohen Courses
Jump to navigationJump to search

This is a review of Wang_2007_language_independent_set_expansion_of_named_entities_using_the_web by user:Apappu.

  • This paper proposes a language independent (with an exception of language-dependent homonym seed info.) set expansion system called SEAL. It contains following components:
  • the fetcher takes a list of seed entities and extracts top k pages from google that contain

seed entities.

  • extractor learns wrappers from few training examples with the help of left and right

context surrounded around the entities. Then, the extracted elements (thanks to wrappers) are passed on to ranker.

  • The ranker tries to establish relationship between different

pages, seeds and patterns. Using, page rank like technique, relevance of a node in the graph is computed.

comments: construction of seed data could be non-trivial especially when the number of representative concept classes are too large. Also, choosing a representative seed itself a plausible proposition which could bias the view of the world.