Anderson et al KDD2012
Contents
Citation
author = {Ashton Anderson and Daniel P. Huttenlocher and Jon M. Kleinberg and Jure Leskovec}, title = {Discovering value from community activity on focused Question Answering Sites: A case study of Stack Overflow}, booktitle = {KDD}, year = {2012}, pages = {850-858}, ee = {http://doi.acm.org/10.1145/2339530.2339665}, crossref = {DBLP:conf/kdd/2012}, bibsource = {DBLP, http://dblp.uni-trier.de}
Online version
Summary
Question Answering websites like Stack Overflow and Quora are growing into large repository of valuable knowledge through the help of community driven knowledge creation process. In this Case Study of Stack Overflow, the authors study it's community driven knowledge creation process and investigate the dynamics of the community activity that shapes the set of answers, both in how answers and voters arrive over time and how it eventually influences the final outcome. They thus consider the entire set of answers to a question as there fundamental unit of analysis instead of analyzing just the best one. The authors observe significant assortativity in the reputation of co-answerers, relationships between reputation and answer speed, and the probability of answer being chosen as the best one on the temporal characteristics of answer arrivals. They then apply there analysis on two prediction tasks. First, Predicting the long term value of the question and it's answers. Second, Predicting weather a question has been appropriately answered.
Dataset Description
The Stack Overflow data used in this paper is publicly available from StackOverflow under a Creative Commons license. One can download the latest version from here.
Here are some of the statistics about the data used by the authors:
- Users 440K (198K questioners, 71K answerers)
- Questions 1M (69% with accepted answer)
- Answers 2.8M (26% marked as accepted)
- Votes 7.6M (93% positive)
- Favorites 775K actions on 318K questions
Motivation
The motivation of the paper is to be able to understand the community dynamics at Question Answering sites like Stack Overflow by considering questions with there set of corresponding answers and not as free standing question answer pairs. Complex questions often generate multiple good answers from different experts bringing out different views and even the best Answers when viewed in isolation may not capture the knowledge created through community interaction around that question. They aim to be able to identify ad highlight questions of lasting value as soon as possible after they have appeared on the site, so that users can be directed to them. For experts who are able to answer difficult questions, there is potential to identify questions that have not been successfully answered and highlight them for increased attention.
Features Used
- Questioner features (SA), 4 features total:
* questioner reputation, * # of questioner’s questions and answers, * questioner’s percentage of accepted answers on their previous questions.
- Activity and Q/A quality measures (SB), 8 features total:
* # of favorites, * # of page views, * # positive and negative votes on question, * # of answers, * maximum answerer reputation, * highest answer score, * reputation of answerer who wrote highest scoring answer,
- Community process features (SC), 8 features total:
* average answerer reputation, * median answerer reputation, * fraction of sum of answerer reputations contributed by max answerer reputation, * sum of answerer reputations, * length of answer by highest-reputation answerer, * # of comments on answer by highest-reputation answerer, * length of highest-scoring answer, * # of comments on highest-scoring answer.
- Temporal process features (SD), 7 features total:
* average time between answers, * median time between answers, * minimum time between answers, * time-rank of highest-scoring answer, * wall-clock time elapsed between question creation and highest-scoring answer, * time-rank of answer by highest reputation answerer, * wall-clock time elapsed between question creation and answer by highest-reputation answerer.
Task Description and Evaluation
There are two concrete tasks that the authors have tried to solve.
- Predicting long term value of a question:
* Proxy for the long term value: Number of pageviews of a question with its answers in a given time frame * Analysis Restricted to questions created in the same month and prediction done on page views one year later. * Binary classification [Page views in bottom or top Quartile/Halfs ]with data set containing 28,772 examples using Logistic Regression with 10 fold Cross Validation. * Prediction made using information available from time frames of 1,2,24 and 72 hours after the question posted. * Baseline: Crowd Sourced Features - # of Favorites on the question and No of positive minus negative votes on the question. * Experimental Results: They report AUC of 0.70 using top 8 Features versus a baseline of AUC 0.56.
- Predicting whether a question has been sufficiently answered:
* Proxy Used: "Bounty Question" * Given K answers on a question page predict weather it is a bounty question or not. * No "crowd sourced baseline" * Accuracy of 0.74 and AUC of 0.83 reported using best 18 features extracted from first k=3 answers.
Findings
- The Reputation Pyramid Model: The authors propose the idea of the reputation pyramid, with high reputation users at top while low reputation users at the bottom. Question enters system from the top and then progressively percolates down the reputation level if unanswered.
- Reputation decreases with increasing rank within a question, showing evidence of a direct relationship between reputation and answer speed
- High-value questions tend to be answered quickly and by high-reputation users.
- Homophily by reputation: Answerers in a given reputation level are attracted to the same sorts of questions,
and that the source of this attraction is not the reputation of the questioner.
- Higher activity produces benefits: The more answers there are, the higher the votes-to-answers ratio. Having more answers increases the number of viewers of the question in the long term, and it does so with no downside in the rate of favoriting — each given user is still equally likely to favorite the question.
Related papers
- Other viewpoint of considering question-answer pairs (and not whole set) as a unit of analysis.
* Q. Liu, E. Agichtein, G. Dror, E. Gabrilovich, Y. Maarek, D. Pelleg, I. Szpektor Predicting web searcher satisfaction with existing community-based answers. SIGIR, 2011.
- Study of dynamics of answer arrivals at Stack Overflow.
* H. Oktay, B. J. Taylor, and D. Jensen. Causal Discovery in Social Media Using Quasi-Experimental Designs SIGKDD Wkshp Soc. Media Analytics, 2010.