Suranah's project abstract

From Cohen Courses
Revision as of 10:42, 3 September 2010 by WikiAdmin (talk | contribs) (1 revision)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

Human Computation for ReadTheWeb

ReadTheWeb has been attempting to develop a knowledge base that mirrors the content of the web. In the current procedure they use bootstrapping to extract information. In other words,a set of verified seed facts is provided, and they are used to learn patterns. These patterns acquire more facts, most of them are discarded, and rest of these high probability facts are used to learn more patterns. The problem right now is both that we miss a lot of facts and patterns in this process, and some wrong facts and patterns are also promoted. In this project, I will implement and experiment with some mechanisms to improve ReadTheWeb.

Specific Goals

I have come up with a mechanism design which can help us verify different facts from ReadTheWeb in a fun, challenging and user-friendly manner. I will be implementing both the required UI, and necessary backend to test it, initially through Mechanical Turk and later on gwap.com.

The backend is especially complicated here. We have to figure out the most optimal set of facts to show to the user to optimize learning, and gleaning of patterns. We also have to ensure that the their is a maximum chance that the user knows (or has some idea) about the entities in a specific round of the game. While the former problem could be solved with some sort of active learning, we will have to experiment with completely new techniques.

Interesting Side Effects

If these mechanisms work well, they can very well augment the notion of never ending learning. Besides, I also suspect that this data can be used to more than just validating patterns and facts. Right now, every single class of relation in ReadTheWeb has to be manually thought out, and then seeds for that added to the system. I believe that some of the data we generate can be used for statistically estimating the next set of the relations and classes.

Super Powers

  • Will be working closely with people involved with ReadTheWeb project
  • Previous experience with IE, designing mechanism designs and implementing social games

Evaluation

I plan to evaluate my system using Mechanical Turk and other structured information repositories like FreeBase. The newer patterns and classes which are produced as a side effect can be validated by increase in ReadTheWeb's performance.