Read the Web Data

From Cohen Courses
Jump to navigationJump to search

Introduction

Read the Web is an open IE project led by Tom Mitchell in Carnegie Mellon University. Recently they released some data that could be used in my project.

Data

There are two types of data publicly available:

  1. Knowledge base extracted by NELL (Read the Web system)
    • List of (~440k) beliefs in the KB download.
      • Beliefs can be categories or relations. (category example: "mountain"; relation example: "mountaininstate")
      • For each belief, relevant contexts are also listed. (e.g., a context for the relation "mountaininstate" is "arg1 mountain range in arg2")
      • The confidences of the beliefs were estimated as well.
    • All contexts learned for each predicate download.
  2. Raw data which contains the contexts for all pairs of NPs.
    • For a pair of NPs, "people" and "hall", context could be "arg2 accommodates arg1".

Possible Application

Given the context of all NP pairs (data 2), we can try to build a binary classifier to judge if a context represents a potential interesting relation. We can use the context of the confident relations (not categories) appear in data 1 as the positive training samples and combine them with some negative ones. Then we have the data to train the classifier.