Bbd writeup of TextRunner

From Cohen Courses
Jump to navigationJump to search

This is a review of Banko_2007_open_information_extraction_from_the_web by user:Bbd.


This paper describes TextRunner system which is an Open IE system i.e system makes just one pass through corpus and extracts relation tuples without any human input. The main modules in this system are :

 - self-supervised learner : This module initially labels its unlabelled training set +/- 
   and feeds it to NBayes classifier which is later used by extractor module.
 - single pass extractor : It makes single pass over data and tags every word with most
   likely POS tage. Then it extracts candidate relations tuples based on phrases occuring 
   in between noun phrases. These are c;assified by NB classifier to get high confidence tuples. 
 - redundancy-based assesor : This module merges the tuples with same entities and relation and 
    finds count of its occurances to estimate how accurate the extracting tuple is.

They describe the details of this system and compare it to KnowItAll and show its superiority.

I liked their approach because it is totally self-supervised learning and can be very useful for extracting large number of relations from web without human supervision.