Philgoo Han writeup of Brin

From Cohen Courses
Jump to navigationJump to search

This is a review of Brin_1999_extracting_patterns_and_relations_from_the_world_wide_web by user:Ironfoot.

  • Extracting information from WWW
    • Low recall & high precision: reminds optimize the 80% and make endurable the 20%
  • Generating patterns from small size seed relation: exponential growth
  • Tuples, patterns duality
  • Patterns
    • 7 feature pattern
    • Heuristic for minimizing false positives
      • Various segmentation and entity recognition methods may help. All the things covered in recent classes.
  • Experiment
    • Lower expansion than expected
    • Might there be all book registered catalog for quality measure? Result analysis seems to be limited on a too intuitive level.