Harper et al., CHI 2009

From Cohen Courses
Revision as of 10:42, 3 September 2010 by WikiAdmin (talk | contribs) (1 revision)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

Citation

Harper, F.M., Moy, D. and Konstan, J. 2009. "Facts or Friends? Distinguishing Informational and Conversational Questions in Social Q&A Sites." In Proceedings of the 27th ACM Conference on Human Factors in Computing Systems.

Online Version

[1]

Summary

While social Q&A sites have become quite popular, the quality of the answers received is variable, and social Q&A sites have so far failed to become reliable sources of high quality information. Part of this failure is blamed on the types of questions that are frequently asked; specifically, many users post chatty questions ("What are you doing right now?") that may promote discussion, but do not encourage informative responses. The authors refer to these kinds of questions as conversational questions, and contrast them with informational questions. They write:

Informational questions are asked with the intent of getting information that the asker hopes to learn or use via fact- or advice-oriented answers. An example: What's the difference between Burma and Myanmar?

Conversational questions are asked with the intent of stimulating discussion. They may be aimed at getting opinions, or they may be acts of self-expression. An example: Do you drink Coke or Pepsi?

The data used in this study was collected from three English language Q&A sites: Yahoo! Answers, Answerbag, and Ask Metafilter. Because the sites differ greatly in terms of the amount of web traffic they receive, data was collected from each site for a different number of days (49 for Yahoo! Answers, 180 for Answerbag, and 808 for Ask Metafilter). A summary of the data is presented in the table below.

Datasets used in Harper et al., 2009
Ask Metafilter Answerbag Yahoo! Answers
# Days 808 180 49
# Users 11,060 51,357 1,575,633
# Questions 45,567 142,704 4,317,966
# Questions/Day 56 793 88,122
# Answers 657,353 806,426 24,661,775
# Answers/Question 14.43 5.65 5.71
% Questions Answered 99.7% 89.9% 88.2%


Throughout the paper, the authors focus on three research questions:

  1. Can humans reliably distinguish between conversational questions and informational questions?
    Conclusion: The human coders were able to agree on a label 87.1% of the time. The authors suggest that there is a class of questions represented by the 12.9% of questions receiving conflicting labels that contain elements of both question types, and this ambiguity makes it hard to choose an appropriate label.
  2. How do informational and conversational questions differ in terms of writing quality and archival value?
    Conclusion: The human coders were asked to rate both the writing quality and archival value of the question on a 5 point Likert scale. Conversational questions were rated as having both lower writing quality and lower archival value, even after controlling for the specific site where the question appeared.
  3. What are the structural differences between conversational questions and informational questions?
    Conclusion: Certain site-specific subject categories are indicative of question type - for instance, the "Polls & Surveys" category on Yahoo! Answers is predictive of conversational questions. Additionally, certain individual words can also be highly predictive of question type. As an example, the word "you" is predictive of conversational questions. Lastly, the ego networks of users who frequently ask conversational questions are larger and more tightly interconnected than the ego networks of people who tend to ask informational questions.

With regard to the latter question, machine learning techniques were used to attempt to predict the question type automatically. The performance of the classifiers was assessed primarily using three metrics: sensitivity (proportion of conversational questions identified correctly), specificity (proportion of informational questions identified correctly), and AUC (area under ROC curve; scalar value representing overall classifier performance). Sensitivity and specificity were chosen over precision and recall because the proportions of conversational and informational questions were different across the three sites. Baseline performance was established using a 0-R algorithm that always predicts the majority class.

0-R Baseline Classifier Performance
Sensitivity Specificity AUC
Yahoo 0.00 1.00 0.50
Answerbag 1.00 0.00 0.50
Metafilter 0.00 1.00 0.50
Overall 0.58 0.79 0.50


The authors first attempted to predict question type using category information. Each of the three sites had between 20-26 top level categories, and Answerbag and Yahoo! Answers had numerous low-level categories that could be used to further classify questions. This classifier was implemented with a Bayesian network algorithm, and the results showed an overall improvement over the baseline classifier. However, a closer look shows that sensitivity was improved by 18% at the expense of a 4% drop in specificity.

Category-Based Classifier Performance
Sensitivity Specificity AUC
Yahoo 0.66 0.82 0.81
Answerbag 0.82 0.41 0.71
Metafilter 0.56 0.95 0.82
Overall 0.77 0.72 0.78


Next, they attempted to predict question type based on text features. The input feature set was constructed using the bag of words approach, and included the 500 most frequently occurring words and bigrams in each kind of question. This classifier was implemented with the sequential minimum optimization (SMO) algorithm. The performance of this classifier was an improvement over the baseline, but was not an improvement over the category-based approach. However, it did perform extremely well on the Answerbag questions. The authors noted that of the individual tokens, "I" appeared more frequently in informational questions (68.6%) than in conversational questions (27.4%), and "you" appeared more frequently in conversational questions (54.7%) than in informational questions (25.8%). Additionally, phrases such as "can," "is there," "help," and "do I" were predictive of informational questions, and phrases such as "do you," "would you," "you think," and "is your" were predictive of conversational questions.

Text-Based Classifier Performance
Sensitivity Specificity AUC
Yahoo 0.48 0.71 0.60
Answerbag 0.79 0.70 0.75
Metafilter 0.00 1.00 0.50
Overall 0.70 0.85 0.62


Finally, the authors attempted to predict question type using social network metrics. They constructed three features for this purpose: the number of neighbors to the question asker, the question asker's number of outbound edges as a percentage of all edges connected to that user, and the clustering coefficient of the question asker's ego network. This classifier was implemented with a Bayesian network algorithm, and had good performance across both Yahoo and Answerbag, but did not perform as well on the Metafilter data.

Network-Based Classifier Performance
Sensitivity Specificity AUC
Yahoo 0.71 0.87 0.81
Answerbag 0.87 0.61 0.72
Metafilter 0.00 1.00 0.64
Overall 0.69 0.89 0.72