Difference between revisions of "Bbd writeup of TextRunner"
From Cohen Courses
Jump to navigationJump to searchm (1 revision) |
|
(No difference)
|
Latest revision as of 10:42, 3 September 2010
This is a review of Banko_2007_open_information_extraction_from_the_web by user:Bbd.
This paper describes TextRunner system which is an Open IE system i.e system makes just one pass through corpus and extracts relation tuples without any human input. The main modules in this system are :
- self-supervised learner : This module initially labels its unlabelled training set +/- and feeds it to NBayes classifier which is later used by extractor module. - single pass extractor : It makes single pass over data and tags every word with most likely POS tage. Then it extracts candidate relations tuples based on phrases occuring in between noun phrases. These are c;assified by NB classifier to get high confidence tuples. - redundancy-based assesor : This module merges the tuples with same entities and relation and finds count of its occurances to estimate how accurate the extracting tuple is.
They describe the details of this system and compare it to KnowItAll and show its superiority.
I liked their approach because it is totally self-supervised learning and can be very useful for extracting large number of relations from web without human supervision.