Difference between revisions of "Riahi et al www 2012"

From Cohen Courses
Jump to navigationJump to search
Line 17: Line 17:
 
== Dataset ==
 
== Dataset ==
  
The [[UsesDataset::Stackoverflow]] data used in this paper is publicly available from [http://stackoverflow.com]. The authors performed data selection on it, resulting a dataset which can be downloaded [http://web.cs.dal.ca/ ̃riahi here].
+
The [[UsesDataset::Stackoverflow]] data used in this paper is publicly available from [http://stackoverflow.com]. The authors performed data selection on it, resulting a dataset which can be downloaded [http://web.cs.dal.ca/~riahi/ here].
  
 
Some statistics about the training set are:  
 
Some statistics about the training set are:  

Revision as of 16:14, 29 September 2012

This a Paper discussed in Social Media Analysis 10-802 in Fall 2012.

Citation

Finding Expert Users in Community Question Answering. Fatemeh Riahi, Zainab Zolaktaf, Mahdi Shafiei, Evangelos Milios. In Proceedings of WWW CQA workshop 2012, pages 791-798.

Online version

Finding Expert Users in Community Question Answering

Summary

In this paper, the authors focus on the problem of expert search in CQA services. The goal of CQA expert search is to route newly-posted questions to CQA users which has the expertise to answer them. Solving this problem is expected to draw high-quality answers and encourage user participation in CQA websites.

Traditional IR methods, TF-IDF and language models, are firstly investigated. Furthermore, the authors apply two topic model fashion methods, LDA and Segmented Topic Model (STM), in which user expertise is modeled as distributions over latent topics. Compared with LDA, STM has the advantage of considering the inner structure of a user profile which is composed of multiple questions. Experiments with the Stackoverflow dataset show expert search performance, as well as latent topics.

Dataset

The Stackoverflow data used in this paper is publicly available from [1]. The authors performed data selection on it, resulting a dataset which can be downloaded here.

Some statistics about the training set are:

 - Questions 369440
 - Askers 186027
 - Answerers 22027
 - Best Answerer Candidates 1845 (answerers who have posted at least 20 best answers)

The testing set contains 5128 questions, which have no overlap with the training set.

Evaluation

The main points in the evaluation part are:

 - quantitatively show that topic model methods are generally better than IR approaches
 - STM performs better than LDA, which indicates that taking advantage of the profile structures is important in expert search
 - qualitatively show that the latent topics discovered by STM have higher quality than the ones by LDA

Discussion

  • Expert search is an important issue in CQA research, but has received less attention compared with other problems such as best answer selection. This paper is more like an empirical study, which examines the effectiveness of several existing methods in the task of CQA expert search. It's not surprising to know that topic model based methods are superior, since they can extract semantic information from user profiles.
  • Besides questions and answers, CQA has more information such as user groups and interaction patterns. Therefore, the STM method can be extended to incorporate these information and model user interests/preferences more accurately.
  • A caveat in the evaluation: the authors design a metric which is in fact equivalent to mean reciprocal rank (MRR); more IR metrics, such as average precision, should be used for complete evaluations.

Related papers

Here are two papers related with this work.