Bbd writeup of Etzioni 2004

From Cohen Courses
Revision as of 13:29, 2 November 2009 by Bbd (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

This is a review of Etzioni_2004_methods_for_domain_independent_information_extraction_from_the_web_an_experimental_comparison by user:bbd.


This paper presents 3 extensions of KnowItAll system to improve recall without sacrificing precision. The new techniques are :

  • Rule learning : It learns domain specific rules to extract more entities from existing domains
  • Subclass extraction : This is based on the concept that extracting instances of specific subclasses may be simpler than extracting instances of general classes. SE detects subclasses within existing domains.
  • List extraction : This tries to learn wrappers which are like regular expressions surrounding list of instances to get more entities of that type.

They introduce an interesting evaluation metric PMI(Pointwise Mutual Information) between instance and discriminator phrase to decide likelihood of instance belonging to that class.

I liked this paper because it established a good technique of set entity expansion which can be applied in various domains and leverages the availibility of huge data on web.