Apappu writeup on Brin '98
From Cohen Courses
Jump to navigationJump to searchThis is a review of Brin_1999_extracting_patterns_and_relations_from_the_world_wide_web by user:Apappu.
- Task: Pattern based book, author name extraction Seed set and then find patterns to find new instances.
- Boot strapping method, an idea which is simple yet powerful, trying to leverage from "Redundancy".
- The patterns mentioned in the paper are similar to Hearst patterns.
- High precision is pretty helpful in this kind of tasks at the cost of low recall, specially, when
the data is really huge (24 million webpages).
- Overall, this paper is a seminal paper in this line of work.