Wka writeup of Banko 2007
From Cohen Courses
Jump to navigationJump to searchThis is a review of banko_2007_open_information_extraction_from_the_web by user:wka.
The authors present the paradigm of open information extraction (OIE) in which large sets of relation tuples are extracted without requiring any human input, as well as the TextRunner system, a complete OIE system that can handle relational user queries. The system is an efficiency improvement on their previous KnowItAll system.
TextRunner consists of 3 main modules:
- The Self-supervised learner: trains a NB using its self-labeling its training data as positive / negative
- The single-pass extractor
- The redundancy-based assessor: uses the number of distinct sentences from which a pattern was extracted to estimate its probability of correctness uses their earlier Urns model.
Evaluating their results:
- Correctness: Tuple is in well-formed relation -> entities in tuple are well-formed; classify according to concrete/abstract
- Number of distinct facts: detect when 2 relations are synonymous; merge relations with little differences