Difference between revisions of "Bbd writeup of TextRunner"

Latest revision as of 10:42, 3 September 2010

This is a review of Banko_2007_open_information_extraction_from_the_web by user:Bbd.

This paper describes TextRunner system which is an Open IE system i.e system makes just one pass through corpus and extracts relation tuples without any human input. The main modules in this system are :

 - self-supervised learner : This module initially labels its unlabelled training set +/- 
   and feeds it to NBayes classifier which is later used by extractor module.
 - single pass extractor : It makes single pass over data and tags every word with most
   likely POS tage. Then it extracts candidate relations tuples based on phrases occuring 
   in between noun phrases. These are c;assified by NB classifier to get high confidence tuples. 
 - redundancy-based assesor : This module merges the tuples with same entities and relation and 
    finds count of its occurances to estimate how accurate the extracting tuple is.

They describe the details of this system and compare it to KnowItAll and show its superiority.

I liked their approach because it is totally self-supervised learning and can be very useful for extracting large number of relations from web without human supervision.

Difference between revisions of "Bbd writeup of TextRunner"

Latest revision as of 10:42, 3 September 2010

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools

Revision as of 13:23, 4 November 2009 (view source) Bbd (talk \| contribs)	Latest revision as of 10:42, 3 September 2010 (view source) WikiAdmin (talk \| contribs) m (1 revision)
(No difference)