Apappu writeup on Brin '98

From Cohen Courses
Jump to navigationJump to search

This is a review of Brin_1999_extracting_patterns_and_relations_from_the_world_wide_web by user:Apappu.

  • Task: Pattern based book, author name extraction Seed set and then find patterns to find new instances.
  • Boot strapping method, an idea which is simple yet powerful, trying to leverage from "Redundancy".
  • The patterns mentioned in the paper are similar to Hearst patterns.
  • High precision is pretty helpful in this kind of tasks at the cost of low recall, specially, when

the data is really huge (24 million webpages).

  • Overall, this paper is a seminal paper in this line of work.