Shuguang's project abstract

Information Extraction (10-707) Project Proposal

Team Member

Shuguang Wang [swang@cs.pitt.edu]

Problem

In this project, we will work on an open information extraction (IE) problem. In an open domain, we can not assume the relations are predetermined before query time. Several projects/systems have been proposed for open information extraction task. They used various ways to construct training data and train a model for each type of relations. In this project, we would look at the problem from a different perspective.

Noun phrases are usually seen as potential entities mentioned in the text, and the relations (if any) between them were represented in the context of them. Given the input of list of noun phrase pairs and their context, can we identify some interesting relations by building a single classifier?

Plan

Current open IE approaches try to determine if a string of text is a certain type of relation between two entities. In this project we determine if a string of text from the context represents an interesting relation between two entities. A natural way to deal with it is to treat it as a classification problem.

There are at least a couple of issues that I am not very clear about this task yet. I would need to look at the data and see exactly what we can do about it. First, we need to explore different possible features. Second, we may need to generate training data for the classifier as we may not have the labels.

Motivation

This task can be seen as a complement to other open Information Extraction projects such as KnowItAll and ReadTheWeb.

Dataset

We should have the access to ReadTheWeb data for this task.

Techniques

Standard classification methods will be used.

Evaluation

The task will be evaluated by human (myself) on a set of randomly selected text. If time permits, I would also try to use the extracted relations in some IR tasks to see if the extracted relations are useful in practice.

My Expertise

I do not really have much superpower on this task. But I am familiar with many machine learning frameworks, and have been working with text data for some time in NLP and IR tasks.

Shuguang's project abstract

Contents

Team Member

Problem

Plan

Motivation

Dataset

Techniques

Evaluation

My Expertise

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools