Yandongl project abstract

From Cohen Courses
Jump to navigationJump to search
  • What you plan to do with what data

I plan to work on data I collected form Yahoo! Answers, which consists of about 200,000 questions and 2,000,000 answers. There is a set of possibilities of what information to extract from this dataset and I am currently interested in the task of better matching asker and answerers in the question-answering community.

  • Why you think it’s interesting

This task of finding the best information provider is not trivial. We need to first extract the expertise for each participant in the community. Then take into consideration many other factors such as availability and maximum number of questions one would like to help per day.

  • Any relevant superpowers you might have

Abundant IR techniques as well as some machine learning knowledge (taking ML at the same time)

  • How you plan to evaluate your work

First evaluate on test data, meaning to apply my approach to closed questions and investigate whether I can predict some of its real answerers. Also, conduct a live test if possible: post some questions and invite the best people for help.

  • What techniques you plan to use

Language modeling. Probabilistic models

  • Who you might work with