Apappu writeup on Brin '98

From Cohen Courses
Revision as of 13:40, 28 October 2009 by Apappu (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

This is a review of Brin_1999_extracting_patterns_and_relations_from_the_world_wide_web by user:Apappu.

  • Task: Pattern based book, author name extraction Seed set and then find patterns to find new instances.
  • Boot strapping method, an idea which is simple yet powerful, trying to leverage from "Redundancy".
  • The patterns mentioned in the paper are similar to Hearst patterns.
  • High precision is pretty helpful in this kind of tasks at the cost of low recall, specially, when

the data is really huge (24 million webpages).

  • Overall, this paper is a seminal paper in this line of work.