Difference between revisions of "Analyzing Community driven Question Answering Sites"
Line 18: | Line 18: | ||
* Votes 7.6M (93% positive) | * Votes 7.6M (93% positive) | ||
* Favorites 775K actions on 318K questions | * Favorites 775K actions on 318K questions | ||
− | |||
− | |||
− | |||
==Techniques Used == | ==Techniques Used == |
Revision as of 08:53, 9 October 2012
Team Members
Abstract
Question answering communities such as Yahoo! Answers and StackOverflow have emerged as popular as well as effective means of information resource on the web. The questions along with the enitre set of corresponding answers is a big resource to explore a lot of question related to question answering. One interesting analysis is to track the lifetime of questions in such environments. The lifetime of a question can vary from the question being declared as closed by community, a short-lived question where an expert sufficiently answers a question or a question which generated a lot of interaction among users for a relatively long duration. Analyzing such question and trying to predict their longevity is one of the goals of our project. Other interesting aspect to explore is identifying questions that have not been sufficiently answered and identifying user expertise for improved recommendations and automatic tag prediction.
Datasets
The Stack Overflow Data that we plan to use is publicly available from StackOverflow under a Creative Commons license. One can download the latest version from here.
Here are some of the statistics about the data:
- Users 440K (198K questioners, 71K answerers)
- Questions 1M (69% with accepted answer)
- Answers 2.8M (26% marked as accepted)
- Votes 7.6M (93% positive)
- Favorites 775K actions on 318K questions
Techniques Used
- We plan to use a wide set of features - incorporating the textual as well as the network attributes.
- To gain initial insights into the data, we'll use standard Topic Models like LDA and SVM for classification.
Challenges
- Relatively unexplored dataset. Most of the work has used Yahoo! Answers data set.
- Complex network dynamics like the reputation system and bounties. Understanding them key to getting good results.
Relevant Literature
- Anderson et al
- Adamic et al studies Yahoo! Answers to explore the interactions of users. Preliminary work on predicting the best answer.
- Jeon at al predicts the quality of answers using non-textual features.