Difference between revisions of "Analyzing Community driven Question Answering Sites"
Line 5: | Line 5: | ||
== Abstract == | == Abstract == | ||
− | Question answering communities such as [http://answers.yahoo.com/ Yahoo! Answers] and [http://stackoverflow.com/ StackOverflow] have emerged as popular as well as effective means of information resource on the web. | + | Question answering communities such as [http://answers.yahoo.com/ Yahoo! Answers] and [http://stackoverflow.com/ StackOverflow] have emerged as popular as well as effective means of information resource on the web. The questions along with the enitre set of corresponding answers is a big resource to explore a lot of question related to question answering. One interesting analysis is to keep track of the lifetime of a question. The lifetime of a question can vary from the question being declared as ''closed'' by community, a short-lived question where an expert sufficiently answers a question or a question which generated a lot of interaction among users for a relatively long duration. Analyzing such question and trying to predict their longevity is one of the goals of our project. |
− | One interesting analysis is to keep track of the lifetime of a question. We also plan to solve the problem of identifying sufficiently answered questions. Given a question, identifying the expertise in a domain is also an interesting question whose answer we | + | In the process, trying to predict the best answer among the other answers is also a goal. |
+ | We also plan to solve the problem of identifying sufficiently answered questions. Given a question, identifying the expertise in a domain is also an interesting question whose answer we | ||
ll try to find. | ll try to find. | ||
== Datasets == | == Datasets == | ||
Line 23: | Line 24: | ||
==Techniques Used == | ==Techniques Used == | ||
+ | * We plan to use a wide set of features - incorporating the textual as well as the network attributes. | ||
+ | * To gain initial insights into the data, we'll use standard Topic Models like LDA and SVM for classification. | ||
+ | == Challenges== | ||
+ | * Relatively unexplored dataset. Most of the work has used Yahoo! Answers dataset. | ||
− | |||
− | |||
== Relevant Literature == | == Relevant Literature == | ||
+ | * [http://malt.ml.cmu.edu/mw/index.php/Anderson_et_al_KDD2012 Anderson et al] | ||
+ | * [http://141.213.232.243/bitstream/2027.42/58015/1/fp840-adamic.pdf Adamic et al] studies Yahoo! Answers to explore the interactions of users. Preliminary work on predicting the best answer. | ||
+ | * [http://ciir.cs.umass.edu/pubfiles/ir-469.pdf Jeon at al] predicts the quality of answers using non-textual features. |
Revision as of 08:32, 9 October 2012
Contents
Team Members
Abstract
Question answering communities such as Yahoo! Answers and StackOverflow have emerged as popular as well as effective means of information resource on the web. The questions along with the enitre set of corresponding answers is a big resource to explore a lot of question related to question answering. One interesting analysis is to keep track of the lifetime of a question. The lifetime of a question can vary from the question being declared as closed by community, a short-lived question where an expert sufficiently answers a question or a question which generated a lot of interaction among users for a relatively long duration. Analyzing such question and trying to predict their longevity is one of the goals of our project. In the process, trying to predict the best answer among the other answers is also a goal. We also plan to solve the problem of identifying sufficiently answered questions. Given a question, identifying the expertise in a domain is also an interesting question whose answer we ll try to find.
Datasets
The Stack Overflow Data used in this paper is publicly available from StackOverflow under a Creative Commons license. One can download the latest version from here.
Here are some of the statistics about the data used by the authors:
- Users 440K (198K questioners, 71K answerers)
- Questions 1M (69% with accepted answer)
- Answers 2.8M (26% marked as accepted)
- Votes 7.6M (93% positive)
- Favorites 775K actions on 318K questions
Baseline
Techniques Used
- We plan to use a wide set of features - incorporating the textual as well as the network attributes.
- To gain initial insights into the data, we'll use standard Topic Models like LDA and SVM for classification.
Challenges
- Relatively unexplored dataset. Most of the work has used Yahoo! Answers dataset.
Relevant Literature
- Anderson et al
- Adamic et al studies Yahoo! Answers to explore the interactions of users. Preliminary work on predicting the best answer.
- Jeon at al predicts the quality of answers using non-textual features.