Read the Web Data
From Cohen Courses
Revision as of 15:56, 30 October 2010 by PastStudents (talk | contribs) (Created page with '==Introduction== Read the Web is an open IE project led by Tom Mitchell in Carnegie Mellon University. Recently they released some data that could be used in my project. ==Data=…')
Introduction
Read the Web is an open IE project led by Tom Mitchell in Carnegie Mellon University. Recently they released some data that could be used in my project.
Data
There are two types of data publicly available:
- Knowledge base extracted by NELL (Read the Web system)
- List of (~440k) beliefs in the KB download.
- Beliefs can be categories or relations. (category example: "mountain"; relation example: "mountaininstate")
- For each belief, relevant contexts are also listed. (e.g., a context for the relation "mountaininstate" is "arg1 mountain range in arg2")
- The confidences of the beliefs were estimated as well.
- All contexts learned for each predicate download.
- List of (~440k) beliefs in the KB download.
- Raw data which contains the contexts for all pairs of NPs.
- For a pair of NPs, "people" and "hall", context could be "arg2 accommodates arg1".
Possible Application
Given the context of all NP pairs (data 2), we can try to build a binary classifier to judge if a context represents a potential interesting relation. We can use the context of the confident relations (not categories) appear in data 1 as the positive training samples and combine them with some negative ones. Then we have the data to train the classifier.