Difference between revisions of "Riahi et al www 2012"

From Cohen Courses
Jump to navigationJump to search
 
(7 intermediate revisions by the same user not shown)
Line 11: Line 11:
 
== Summary ==
 
== Summary ==
  
In this paper, the authors focus on the problem of [[expert search]] in CQA services. The goal of CQA [[expert search]] is to route newly-posted questions to CQA users which has the expertise to answer them. Solving this problem is expected to draw high-quality answers and encourage user participation in CQA websites.
+
In this [[Category::Paper|paper]], the authors focus on the problem of [[AddressesProblem::Expert Search|expert search]] in CQA services. The goal of CQA expert search is to route newly-posted questions to CQA users which has the expertise to answer them. Solving this problem is expected to draw high-quality answers and encourage user participation in CQA websites.
  
Traditional IR methods, [http://malt.ml.cmu.edu/mw/index.php/Log_Tempered_TF-IDF TF-IDF] and language models, are firstly investigated. Furthermore, the authors apply two [[topic model]] fashion methods, [[LDA]] and [[Segmented Topic Model]] (STM), in which user expertise is modeled as distributions over latent topics. Compared with [[LDA]], [http://malt.ml.cmu.edu/mw/index.php/Segmented_Topic_Model STM] has the advantage of considering the inner structure of a user profile which is composed of multiple questions. Experiments with the [[UsesDataset::Stackoverflow]] dataset show expert search performance, as well as latent topics.
+
Traditional IR methods, [http://malt.ml.cmu.edu/mw/index.php/Log_Tempered_TF-IDF TF-IDF] and language models, are firstly investigated to deal with this problem. Furthermore, the authors apply two [[topic model]] fashion methods, [[LDA]] and [[Segmented Topic Model]] (STM), in which user expertise is modeled as distributions over latent topics. Compared with [[LDA]], [http://malt.ml.cmu.edu/mw/index.php/Segmented_Topic_Model STM] has the advantage of considering the inner structure of a user profile which is composed of multiple questions. Experiments with the [[UsesDataset::Stackoverflow]] dataset show expert search performance comparison among the various methods. Also, the authors present latent topics discovered from the dataset.
 +
 
 +
== Dataset ==
 +
 
 +
The [[UsesDataset::Stackoverflow]] data used in this paper is publicly available from [http://stackoverflow.com]. The authors performed data selection on it, resulting a dataset which can be downloaded [http://web.cs.dal.ca/~riahi/ here].
 +
 
 +
Some statistics about the training set are:
 +
  - Questions 369440
 +
  - Askers 186027
 +
  - Answerers 22027
 +
  - Best Answerer Candidates 1845 (answerers who have posted at least 20 best answers)
 +
 
 +
The testing set contains 5128 questions, which have no overlap with the training set.
  
 
== Evaluation ==
 
== Evaluation ==
Line 19: Line 31:
 
The main points in the evaluation part are:
 
The main points in the evaluation part are:
 
   - quantitatively show that topic model methods are generally better than IR approaches
 
   - quantitatively show that topic model methods are generally better than IR approaches
   - STM performs better than [[LDA]], which indicates that taking advantage of the profile structures is important in expert search
+
   - [http://malt.ml.cmu.edu/mw/index.php/Segmented_Topic_Model STM] performs better than [[LDA]], which indicates that taking advantage of the profile structures is important in expert search
   - qualitatively show that the latent topics discovered by STM have higher quality than the ones by [[LDA]]
+
   - qualitatively show that the latent topics discovered by [http://malt.ml.cmu.edu/mw/index.php/Segmented_Topic_Model STM] have higher quality than the ones by [[LDA]]
  
 
== Discussion ==
 
== Discussion ==
 
+
  + plus point - minus point
* Expert search is an important issue in [[CQA]] research, but has received less attention compared with other problems such as best answer selection. This paper is more like an empirical study, which examines the effectiveness of several existing methods in the task of [[CQA]] expert search. It's not surprising to know that [[topic model]] based methods are superior, since they can extract semantic information from user profiles.
+
*(+) [[AddressesProblem::Expert Search|Expert search]] is an important issue in CQA research, but has received less attention compared with other problems such as best answer selection. This paper is more like an empirical study, which examines the effectiveness of several existing methods in the task of CQA expert search. It's not surprising to know that [[topic model]] based methods are superior, since they can extract semantic information from user profiles.
* Besides questions and answers, [[CQA]] has more information such as user groups and interaction patterns. Therefore, the [[STM]] method can be extended to incorporate these information and model user interests/preferences more accurately.
+
*(-) Besides questions and answers, CQA has more information such as user groups and interaction patterns. Therefore, the STM method can be extended to incorporate these information and model user interests/preferences more accurately.
* A caveat in the evaluation: the authors design a metric which is in fact equivalent to [[mean reciprocal rank]] (MRR); more [[IR]] metrics, such as [[average precision]], should be used for complete evaluations.
+
*(-) A caveat in the evaluation: the authors design a metric which is in fact equivalent to [[http://en.wikipedia.org/wiki/Mean_reciprocal_rank mean reciprocal rank]] (MRR). In IR/search evaluations, multiple evaluation metrics are generally adopted. Thus, the authors should use more metrics, such as [[http://en.wikipedia.org/wiki/Information_retrieval mean average precision]] (MAP), for complete evaluations.
  
 
== Related papers ==
 
== Related papers ==
 
Here are two papers related with this work.
 
Here are two papers related with this work.
* [http://dl.acm.org/citation.cfm?id=1458204 Tapping on the Potential of Q&A Community by Recommending Answer Providers]  
+
* [http://dl.acm.org/citation.cfm?id=1458204 Tapping on the Potential of Q&A Community by Recommending Answer Providers]
* [http://dl.acm.org/citation.cfm?id=1884036 Predicting Best Answerers for New Questions in Community Question Answering]
+
  - Uniformly model asking and answering patterns with generative topic models; simulate user interests based on latent topics
 +
  - Present the graphical model and procedures for Gibbs sampling, which gives me an opportunity to learn these aspects and design my own methods.
 +
* [http://dl.acm.org/citation.cfm?id=1526942 Probabilistic Question Recommendation for Question Answering Communities]
 +
  - Propose a user-word aspect model to deal user-word sparseness in topic models
 +
  - Show how to perform expert search with the modeled latent topics
 +
  - I can learn how to do user search evaluation in CQA, including datasets and metrics

Latest revision as of 20:49, 3 October 2012

This a Paper discussed in Social Media Analysis 10-802 in Fall 2012.

Citation

Finding Expert Users in Community Question Answering. Fatemeh Riahi, Zainab Zolaktaf, Mahdi Shafiei, Evangelos Milios. In Proceedings of WWW CQA workshop 2012, pages 791-798.

Online version

Finding Expert Users in Community Question Answering

Summary

In this paper, the authors focus on the problem of expert search in CQA services. The goal of CQA expert search is to route newly-posted questions to CQA users which has the expertise to answer them. Solving this problem is expected to draw high-quality answers and encourage user participation in CQA websites.

Traditional IR methods, TF-IDF and language models, are firstly investigated to deal with this problem. Furthermore, the authors apply two topic model fashion methods, LDA and Segmented Topic Model (STM), in which user expertise is modeled as distributions over latent topics. Compared with LDA, STM has the advantage of considering the inner structure of a user profile which is composed of multiple questions. Experiments with the Stackoverflow dataset show expert search performance comparison among the various methods. Also, the authors present latent topics discovered from the dataset.

Dataset

The Stackoverflow data used in this paper is publicly available from [1]. The authors performed data selection on it, resulting a dataset which can be downloaded here.

Some statistics about the training set are:

 - Questions 369440
 - Askers 186027
 - Answerers 22027
 - Best Answerer Candidates 1845 (answerers who have posted at least 20 best answers)

The testing set contains 5128 questions, which have no overlap with the training set.

Evaluation

The main points in the evaluation part are:

 - quantitatively show that topic model methods are generally better than IR approaches
 - STM performs better than LDA, which indicates that taking advantage of the profile structures is important in expert search
 - qualitatively show that the latent topics discovered by STM have higher quality than the ones by LDA

Discussion

 + plus point - minus point
  • (+) Expert search is an important issue in CQA research, but has received less attention compared with other problems such as best answer selection. This paper is more like an empirical study, which examines the effectiveness of several existing methods in the task of CQA expert search. It's not surprising to know that topic model based methods are superior, since they can extract semantic information from user profiles.
  • (-) Besides questions and answers, CQA has more information such as user groups and interaction patterns. Therefore, the STM method can be extended to incorporate these information and model user interests/preferences more accurately.
  • (-) A caveat in the evaluation: the authors design a metric which is in fact equivalent to [mean reciprocal rank] (MRR). In IR/search evaluations, multiple evaluation metrics are generally adopted. Thus, the authors should use more metrics, such as [mean average precision] (MAP), for complete evaluations.

Related papers

Here are two papers related with this work.

 - Uniformly model asking and answering patterns with generative topic models; simulate user interests based on latent topics
 - Present the graphical model and procedures for Gibbs sampling, which gives me an opportunity to learn these aspects and design my own methods. 
 - Propose a user-word aspect model to deal user-word sparseness in topic models
 - Show how to perform expert search with the modeled latent topics
 - I can learn how to do user search evaluation in CQA, including datasets and metrics