Philgoo Han writeup of Banko, Cafarella, Soderland, Broadhead and Etzioni

From Cohen Courses
Jump to navigationJump to search

This is a review of Banko_2007_open_information_extraction_from_the_web by user:Ironfoot.

  • TextRunner: bootstrapping, domain independant, scalable
  • Self supervised learner
    • Parse seed data to find tuple features for Naive Bayes Classifier
      • Any suffering from feature dependancy?
      • Can sufficient feature be found?
      • Small data -> high bias?
  • Single Pass Extractor
    • Most probable POS of each word -> noun phrase chunker(entity found here) -> non-essential phrase elimination(relation found here)
    • Classify with classifier above to find trustworthy entity relation tuples
  • Redundancy based assessor
    • Simple redundancy count assessor
  • Query Processing
    • Distributed inverted indexing
  • Low model complexity
  • Compared results with KnowItAll