Difference between revisions of "Apappu writeup on Brin '98"
From Cohen Courses
Jump to navigationJump to searchm (1 revision) |
|
(No difference)
|
Latest revision as of 10:42, 3 September 2010
This is a review of Brin_1999_extracting_patterns_and_relations_from_the_world_wide_web by user:Apappu.
- Task: Pattern based book, author name extraction Seed set and then find patterns to find new instances.
- Boot strapping method, an idea which is simple yet powerful, trying to leverage from "Redundancy".
- The patterns mentioned in the paper are similar to Hearst patterns.
- High precision is pretty helpful in this kind of tasks at the cost of low recall, specially, when
the data is really huge (24 million webpages).
- Overall, this paper is a seminal paper in this line of work.