Difference between revisions of "Pal et al CIKM 2010"

From Cohen Courses
Jump to navigationJump to search
Line 44: Line 44:
 
   - Simulate asking and answering behaviors using using a generative model
 
   - Simulate asking and answering behaviors using using a generative model
 
   - Perform expert search based on user interests which are represented by latent topics
 
   - Perform expert search based on user interests which are represented by latent topics
* [http://dl.acm.org/citation.cfm?id=1367587 Knowledge Sharing and Yahoo Answers:Everyone Knows Something]
+
* [http://dl.acm.org/citation.cfm?id=1526942 Probabilistic Question Recommendation for Question Answering Communities]
   - Also an empirical study, focusing on user interaction and category characteristics
+
   - Propose a user-word aspect model to deal user-word sparseness in topic models
  - Study user interests in terms of cross-category entropy, and show that this entropy highly correlates with expertise/rates
+
  - Show how to perform expert search with the modeled latent topics
   - Use the Yahoo Answers dataset, which is commonly used in CQA research
+
   - I can learn how to do user search evaluation in CQA, including datasets and metrics

Revision as of 20:48, 3 October 2012

This a Paper discussed in Social Media Analysis 10-802 in Fall 2012.

Citation

Expert Identification in Community Question Answering: Exploring Question Selection Bias. Aditya Pal, Joseph A. Konstan. In Proceedings of CIKM 2010, pages 1505-1508.

Online version

Expert Identification in Community Question Answering: Exploring Question Selection Bias

Summary

This paper presents the concept of question selection bias as a new measure to study the expertise of CQA users. This bias provides indications about users' preference to answer questions with respect to completeness, which can be measured by the status (best answer) or number of votes of its answers. The basic finding is that experts tend to pick questions with low existing completeness.

A simple mathematical model is proposed to quantitatively compute the selection bias. Using these bias values as features, the authors apply machine learning (classification) methods to distinguish experts and ordinary users. Experiments with the TurboTax dataset show that selection bias values are superior over other types of features coming from Z-score or text analysis. Mixing up selection bias and text features provides further improvements on the classification performance. Comparison of the classifiers proves that Gaussian classification performs consistently better than linear regression and logistic regression

Dataset

The TurboTax dataset used in this paper has been collected from TurboTax Live Community, a CQA site on preparation of tax returns. Some statistics about the dataset are:

 - Questions 633112  Askers 525143
 - Answers 688390   Answerers 130770
 - 83 experts selected by TurboTax employees 
 - 1367 answerers have provided at least 10 answers

Evaluation

The authors adopt Precision, Recall and F-score as evaluation metrics. The following conclusions arise from their evaluations:

  • CQA experts have the tendency to answer questions with low completeness, which makes their responses more valuable.
  • The selection bias scores modeled in this paper can provide indications about whether an user is a expert. These bias scores are proved to be effective features for identification of CQA experts.
  • On the task of expert identification, Gaussian classification achieves better results than linear regression and logistic regression.
  • Selection bias is not influenced by dynamics of CQA sites, and can be considered as intrinsic characteristics of CQA users.

Discussion

+ plus points - minus point

  • (+) This paper falls into the area of expert search, which is an important problem in CQA research. The authors present interesting observations on selection bias of expert users in CQA. These findings are useful for question recommendation. For example, we should recommend questions with low completeness (few answers) to experts.
  • (-) The mathematical model for selection bias computation is pretty straightforward. Also, the authors rely on the commonly-used classifiers for expert identification, rather than come up with more sophisticated approaches. Thus, I would take this paper as an empirical study, whose emphasis is on the empirical observations of the selection bias concept.
  • (-) Most of the work is specifically based on the TurboTax dataset, which may limit the application of the approach. For example, TurboTax has the manual expert judgments which are not available in other datasets. In this case, expert identification cannot be translated into a classification problem.

Related papers

Here are two papers related with this work.

 - Give a detailed overview of CQA expert search
 - Simulate asking and answering behaviors using using a generative model
 - Perform expert search based on user interests which are represented by latent topics
 - Propose a user-word aspect model to deal user-word sparseness in topic models
 - Show how to perform expert search with the modeled latent topics
 - I can learn how to do user search evaluation in CQA, including datasets and metrics