Difference between revisions of "Anderson et al KDD2012"

From Cohen Courses
Jump to navigationJump to search
Line 20: Line 20:
 
Question Answering websites like Stack Overflow and Quora are growing into large repository of valuable knowledge with the help of community driven knowledge creation process. In this Case Study of Stack Overflow, the authors study it's community driven knowledge creation process and investigate the dynamics of the community activity that shapes the set of answers, both in how answers and voters arrive over time and how it eventually influences the final outcome. They thus consider the entire set of answers to a question as there fundamental unit of analysis instead of analyzing just the best one.
 
Question Answering websites like Stack Overflow and Quora are growing into large repository of valuable knowledge with the help of community driven knowledge creation process. In this Case Study of Stack Overflow, the authors study it's community driven knowledge creation process and investigate the dynamics of the community activity that shapes the set of answers, both in how answers and voters arrive over time and how it eventually influences the final outcome. They thus consider the entire set of answers to a question as there fundamental unit of analysis instead of analyzing just the best one.
 
The authors observe significant assortativity in the reputation of co-answerers, relationships between reputation and answer speed, and the probability of answer being chosen as the best one on the temporal characteristics of answer arrivals. They then apply there analysis on two prediction tasks. First, Predicting the long term value of the question and it's answers. Second, Predicting weather a question has been appropriately answered.
 
The authors observe significant assortativity in the reputation of co-answerers, relationships between reputation and answer speed, and the probability of answer being chosen as the best one on the temporal characteristics of answer arrivals. They then apply there analysis on two prediction tasks. First, Predicting the long term value of the question and it's answers. Second, Predicting weather a question has been appropriately answered.
 +
 +
== Dataset Description ==
 +
 +
The Stack Overflow data used in this paper is publicly available from StackOverflow under a Creative Commons license.
 +
One can download the latest version from [http://blog.stackoverflow.com/category/cc-wiki-dump/ here].
 +
 +
Here are some of the statistics about the data used by the authors:
 +
 +
* Users 440K (198K questioners, 71K answerers)
 +
* Questions 1M (69% with accepted answer)
 +
* Answers 2.8M (26% marked as accepted)
 +
* Votes 7.6M (93% positive)
 +
* Favorites 775K actions on 318K questions
 +
 +
Stack Overflow's Reputation System:
 +
 +
 +
 +
 +
 +
  
 
== Evaluation ==
 
== Evaluation ==

Revision as of 06:16, 27 September 2012

Citation

 author    = {Ashton Anderson and
              Daniel P. Huttenlocher and
              Jon M. Kleinberg and
              Jure Leskovec},
 title     = {Discovering value from community activity on focused Question Answering Sites: A case study of Stack Overflow},
 booktitle = {KDD},
 year      = {2012},
 pages     = {850-858},
 ee        = {http://doi.acm.org/10.1145/2339530.2339665},
 crossref  = {DBLP:conf/kdd/2012},
 bibsource = {DBLP, http://dblp.uni-trier.de}

Online version

Discovering Value from Community Activity on Focused Question Answering Sites: A Case Study of Stack Overflow

Summary

Question Answering websites like Stack Overflow and Quora are growing into large repository of valuable knowledge with the help of community driven knowledge creation process. In this Case Study of Stack Overflow, the authors study it's community driven knowledge creation process and investigate the dynamics of the community activity that shapes the set of answers, both in how answers and voters arrive over time and how it eventually influences the final outcome. They thus consider the entire set of answers to a question as there fundamental unit of analysis instead of analyzing just the best one. The authors observe significant assortativity in the reputation of co-answerers, relationships between reputation and answer speed, and the probability of answer being chosen as the best one on the temporal characteristics of answer arrivals. They then apply there analysis on two prediction tasks. First, Predicting the long term value of the question and it's answers. Second, Predicting weather a question has been appropriately answered.

Dataset Description

The Stack Overflow data used in this paper is publicly available from StackOverflow under a Creative Commons license. One can download the latest version from here.

Here are some of the statistics about the data used by the authors:

  • Users 440K (198K questioners, 71K answerers)
  • Questions 1M (69% with accepted answer)
  • Answers 2.8M (26% marked as accepted)
  • Votes 7.6M (93% positive)
  • Favorites 775K actions on 318K questions

Stack Overflow's Reputation System:




Evaluation

Discussion

Related papers

Study plan