Selen writeup of TextRunner

From Cohen Courses
Jump to navigationJump to search

This is a review of Banko_2007_open_information_extraction_from_the_web by user:Selen.

In this paper they present an open domain information extraction system, TextRunner. Their goal is to extract relation given only the corpus. They do it by using only one pass over the corpus, in other words they do not perform bootstrapping. They compare their approach to KnowItAll, and they claim that their method is an improrvement sincd it doesn't rely on a search engine(recall that this was the biggest issue with KnowItAll) and they don't take any relation specific input.

TextRunner has three modules:

  • Self-supervised learner
  • Single-Pass Extractor
  • Redundancy based Assessor

To evaluate the accuracy of the relation, they embed a classifier, which is Naive Bayes. As a result they report a 6 percent improvement over KnowItAll system.

I like the idea of o-crf