Philgoo Han writeup of Banko, Cafarella, Soderland, Broadhead and Etzioni
From Cohen Courses
Jump to navigationJump to searchThis is a review of Banko_2007_open_information_extraction_from_the_web by user:Ironfoot.
- TextRunner: bootstrapping, domain independant, scalable
- Self supervised learner
- Parse seed data to find tuple features for Naive Bayes Classifier
- Any suffering from feature dependancy?
- Can sufficient feature be found?
- Small data -> high bias?
- Parse seed data to find tuple features for Naive Bayes Classifier
- Single Pass Extractor
- Most probable POS of each word -> noun phrase chunker(entity found here) -> non-essential phrase elimination(relation found here)
- Classify with classifier above to find trustworthy entity relation tuples
- Redundancy based assessor
- Simple redundancy count assessor
- Query Processing
- Distributed inverted indexing
- Low model complexity
- Compared results with KnowItAll