Riahi et al www 2012

From Cohen Courses
Revision as of 20:49, 3 October 2012 by Ymiao (talk | contribs) (→‎Related papers)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

This a Paper discussed in Social Media Analysis 10-802 in Fall 2012.

Citation

Finding Expert Users in Community Question Answering. Fatemeh Riahi, Zainab Zolaktaf, Mahdi Shafiei, Evangelos Milios. In Proceedings of WWW CQA workshop 2012, pages 791-798.

Online version

Finding Expert Users in Community Question Answering

Summary

In this paper, the authors focus on the problem of expert search in CQA services. The goal of CQA expert search is to route newly-posted questions to CQA users which has the expertise to answer them. Solving this problem is expected to draw high-quality answers and encourage user participation in CQA websites.

Traditional IR methods, TF-IDF and language models, are firstly investigated to deal with this problem. Furthermore, the authors apply two topic model fashion methods, LDA and Segmented Topic Model (STM), in which user expertise is modeled as distributions over latent topics. Compared with LDA, STM has the advantage of considering the inner structure of a user profile which is composed of multiple questions. Experiments with the Stackoverflow dataset show expert search performance comparison among the various methods. Also, the authors present latent topics discovered from the dataset.

Dataset

The Stackoverflow data used in this paper is publicly available from [1]. The authors performed data selection on it, resulting a dataset which can be downloaded here.

Some statistics about the training set are:

 - Questions 369440
 - Askers 186027
 - Answerers 22027
 - Best Answerer Candidates 1845 (answerers who have posted at least 20 best answers)

The testing set contains 5128 questions, which have no overlap with the training set.

Evaluation

The main points in the evaluation part are:

 - quantitatively show that topic model methods are generally better than IR approaches
 - STM performs better than LDA, which indicates that taking advantage of the profile structures is important in expert search
 - qualitatively show that the latent topics discovered by STM have higher quality than the ones by LDA

Discussion

 + plus point - minus point
  • (+) Expert search is an important issue in CQA research, but has received less attention compared with other problems such as best answer selection. This paper is more like an empirical study, which examines the effectiveness of several existing methods in the task of CQA expert search. It's not surprising to know that topic model based methods are superior, since they can extract semantic information from user profiles.
  • (-) Besides questions and answers, CQA has more information such as user groups and interaction patterns. Therefore, the STM method can be extended to incorporate these information and model user interests/preferences more accurately.
  • (-) A caveat in the evaluation: the authors design a metric which is in fact equivalent to [mean reciprocal rank] (MRR). In IR/search evaluations, multiple evaluation metrics are generally adopted. Thus, the authors should use more metrics, such as [mean average precision] (MAP), for complete evaluations.

Related papers

Here are two papers related with this work.

 - Uniformly model asking and answering patterns with generative topic models; simulate user interests based on latent topics
 - Present the graphical model and procedures for Gibbs sampling, which gives me an opportunity to learn these aspects and design my own methods. 
 - Propose a user-word aspect model to deal user-word sparseness in topic models
 - Show how to perform expert search with the modeled latent topics
 - I can learn how to do user search evaluation in CQA, including datasets and metrics