Difference between revisions of "Riahi et al www 2012"
Line 14: | Line 14: | ||
Traditional IR methods, [http://malt.ml.cmu.edu/mw/index.php/Log_Tempered_TF-IDF TF-IDF] and language models, are firstly investigated. Furthermore, the authors apply two [[topic model]] fashion methods, [[LDA]] and [[Segmented Topic Model]] (STM), in which user expertise is modeled as distributions over latent topics. Compared with [[LDA]], [http://malt.ml.cmu.edu/mw/index.php/Segmented_Topic_Model STM] has the advantage of considering the inner structure of a user profile which is composed of multiple questions. Experiments with the [[UsesDataset::Stackoverflow]] dataset show expert search performance, as well as latent topics. | Traditional IR methods, [http://malt.ml.cmu.edu/mw/index.php/Log_Tempered_TF-IDF TF-IDF] and language models, are firstly investigated. Furthermore, the authors apply two [[topic model]] fashion methods, [[LDA]] and [[Segmented Topic Model]] (STM), in which user expertise is modeled as distributions over latent topics. Compared with [[LDA]], [http://malt.ml.cmu.edu/mw/index.php/Segmented_Topic_Model STM] has the advantage of considering the inner structure of a user profile which is composed of multiple questions. Experiments with the [[UsesDataset::Stackoverflow]] dataset show expert search performance, as well as latent topics. | ||
+ | |||
+ | == Dataset == | ||
+ | |||
+ | The [[UsesDataset::Stackoverflow]] data used in this paper is publicly available from [http://stackoverflow.com]. The authors performed data selection on it, resulting a dataset which can be downloaded [http://web.cs.dal.ca/ ̃riahi here]. | ||
+ | |||
+ | Some statistics about the training set are: | ||
+ | - Questions 369440 | ||
+ | - Askers 186027 | ||
+ | - Answerers 22027 | ||
+ | - Best Answerer Candidates 1845 (answerers who have posted at least 20 best answers) | ||
+ | |||
+ | The testing set contains 5128 questions, which have no overlap with the training set. | ||
== Evaluation == | == Evaluation == |
Revision as of 15:12, 29 September 2012
This a Paper discussed in Social Media Analysis 10-802 in Fall 2012.
Citation
Finding Expert Users in Community Question Answering. Fatemeh Riahi, Zainab Zolaktaf, Mahdi Shafiei, Evangelos Milios. In Proceedings of WWW CQA workshop 2012, pages 791-798.
Online version
Finding Expert Users in Community Question Answering
Summary
In this paper, the authors focus on the problem of expert search in CQA services. The goal of CQA expert search is to route newly-posted questions to CQA users which has the expertise to answer them. Solving this problem is expected to draw high-quality answers and encourage user participation in CQA websites.
Traditional IR methods, TF-IDF and language models, are firstly investigated. Furthermore, the authors apply two topic model fashion methods, LDA and Segmented Topic Model (STM), in which user expertise is modeled as distributions over latent topics. Compared with LDA, STM has the advantage of considering the inner structure of a user profile which is composed of multiple questions. Experiments with the Stackoverflow dataset show expert search performance, as well as latent topics.
Dataset
The Stackoverflow data used in this paper is publicly available from [1]. The authors performed data selection on it, resulting a dataset which can be downloaded ̃riahi here.
Some statistics about the training set are:
- Questions 369440 - Askers 186027 - Answerers 22027 - Best Answerer Candidates 1845 (answerers who have posted at least 20 best answers)
The testing set contains 5128 questions, which have no overlap with the training set.
Evaluation
The main points in the evaluation part are:
- quantitatively show that topic model methods are generally better than IR approaches - STM performs better than LDA, which indicates that taking advantage of the profile structures is important in expert search - qualitatively show that the latent topics discovered by STM have higher quality than the ones by LDA
Discussion
- Expert search is an important issue in CQA research, but has received less attention compared with other problems such as best answer selection. This paper is more like an empirical study, which examines the effectiveness of several existing methods in the task of CQA expert search. It's not surprising to know that topic model based methods are superior, since they can extract semantic information from user profiles.
- Besides questions and answers, CQA has more information such as user groups and interaction patterns. Therefore, the STM method can be extended to incorporate these information and model user interests/preferences more accurately.
- A caveat in the evaluation: the authors design a metric which is in fact equivalent to mean reciprocal rank (MRR); more IR metrics, such as average precision, should be used for complete evaluations.
Related papers
Here are two papers related with this work.