Difference between revisions of "Anderson et al KDD2012"
Line 20: | Line 20: | ||
Question Answering websites like Stack Overflow and Quora are growing into large repository of valuable knowledge with the help of community driven knowledge creation process. In this Case Study of Stack Overflow, the authors study it's community driven knowledge creation process and investigate the dynamics of the community activity that shapes the set of answers, both in how answers and voters arrive over time and how it eventually influences the final outcome. They thus consider the entire set of answers to a question as there fundamental unit of analysis instead of analyzing just the best one. | Question Answering websites like Stack Overflow and Quora are growing into large repository of valuable knowledge with the help of community driven knowledge creation process. In this Case Study of Stack Overflow, the authors study it's community driven knowledge creation process and investigate the dynamics of the community activity that shapes the set of answers, both in how answers and voters arrive over time and how it eventually influences the final outcome. They thus consider the entire set of answers to a question as there fundamental unit of analysis instead of analyzing just the best one. | ||
The authors observe significant assortativity in the reputation of co-answerers, relationships between reputation and answer speed, and the probability of answer being chosen as the best one on the temporal characteristics of answer arrivals. They then apply there analysis on two prediction tasks. First, Predicting the long term value of the question and it's answers. Second, Predicting weather a question has been appropriately answered. | The authors observe significant assortativity in the reputation of co-answerers, relationships between reputation and answer speed, and the probability of answer being chosen as the best one on the temporal characteristics of answer arrivals. They then apply there analysis on two prediction tasks. First, Predicting the long term value of the question and it's answers. Second, Predicting weather a question has been appropriately answered. | ||
+ | |||
+ | == Dataset Description == | ||
+ | |||
+ | The Stack Overflow data used in this paper is publicly available from StackOverflow under a Creative Commons license. | ||
+ | One can download the latest version from [http://blog.stackoverflow.com/category/cc-wiki-dump/ here]. | ||
+ | |||
+ | Here are some of the statistics about the data used by the authors: | ||
+ | |||
+ | * Users 440K (198K questioners, 71K answerers) | ||
+ | * Questions 1M (69% with accepted answer) | ||
+ | * Answers 2.8M (26% marked as accepted) | ||
+ | * Votes 7.6M (93% positive) | ||
+ | * Favorites 775K actions on 318K questions | ||
+ | |||
+ | Stack Overflow's Reputation System: | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
== Evaluation == | == Evaluation == |
Revision as of 05:16, 27 September 2012
Contents
Citation
author = {Ashton Anderson and Daniel P. Huttenlocher and Jon M. Kleinberg and Jure Leskovec}, title = {Discovering value from community activity on focused Question Answering Sites: A case study of Stack Overflow}, booktitle = {KDD}, year = {2012}, pages = {850-858}, ee = {http://doi.acm.org/10.1145/2339530.2339665}, crossref = {DBLP:conf/kdd/2012}, bibsource = {DBLP, http://dblp.uni-trier.de}
Online version
Summary
Question Answering websites like Stack Overflow and Quora are growing into large repository of valuable knowledge with the help of community driven knowledge creation process. In this Case Study of Stack Overflow, the authors study it's community driven knowledge creation process and investigate the dynamics of the community activity that shapes the set of answers, both in how answers and voters arrive over time and how it eventually influences the final outcome. They thus consider the entire set of answers to a question as there fundamental unit of analysis instead of analyzing just the best one. The authors observe significant assortativity in the reputation of co-answerers, relationships between reputation and answer speed, and the probability of answer being chosen as the best one on the temporal characteristics of answer arrivals. They then apply there analysis on two prediction tasks. First, Predicting the long term value of the question and it's answers. Second, Predicting weather a question has been appropriately answered.
Dataset Description
The Stack Overflow data used in this paper is publicly available from StackOverflow under a Creative Commons license. One can download the latest version from here.
Here are some of the statistics about the data used by the authors:
- Users 440K (198K questioners, 71K answerers)
- Questions 1M (69% with accepted answer)
- Answers 2.8M (26% marked as accepted)
- Votes 7.6M (93% positive)
- Favorites 775K actions on 318K questions
Stack Overflow's Reputation System: