Difference between revisions of "Predicting web searcher satisfaction with existing community-based answers"

From Cohen Courses
Jump to navigationJump to search
 
(17 intermediate revisions by the same user not shown)
Line 22: Line 22:
  
 
== Summary ==
 
== Summary ==
The paper proposes a solution to a novel problem of predicting and validating the usefulness of Community-based Question Answering (CQA) sites for an external web searcher rather than an asker belonging to a community. The work has looked at three major components in the pipeline of solving the satisfaction of users. They are as follows -
+
The paper proposes a solution to a novel problem of predicting and validating the usefulness of Community-based Question Answering (CQA) sites for an external web searcher rather than an asker belonging to a community. The work has looked at three major components in the pipeline of solving the [[AddressesProblem::query satisfaction]]of users. They are as follows -
  
 
1. query clarity task - Whether a query is unambiguous enough to be interpreted as a question.
 
1. query clarity task - Whether a query is unambiguous enough to be interpreted as a question.
Line 59: Line 59:
 
===Models===
 
===Models===
 
* Direct Logistic Regression
 
* Direct Logistic Regression
All the features listed above are using Logistic Regression as the regressor model.
+
All the features listed above are using [[UsesMethod::Logistic regression]]  as the regressor model.
 
* Composite Logisic Regression
 
* Composite Logisic Regression
A separate model is trained for each of he three subtasks mentioned above. The models are then combined  
+
A separate model is trained for each of he three subtasks mentioned above. The features pertaining to the subtask are used to train three separate regression models. The models are then combined into one by using Logistic Regression which assigns weights on the individual model.
*'''Classification Algorithm'''
+
 
The algorithm used to assign the sentiment label to test examples is a slight modification of the [[UsesMethod::K-Nearest_Neighbor|k-NN algorithm]].
+
[[File:Model.jpg]]
  
 
== Evaluation ==
 
== Evaluation ==
===Evaluation using cross-validation===
+
The evaluation is performed on [[UsesDataset::Click_Dataset_Google_Yahoo|Click Dataset on Google search leading to Yahoo! Answers]].Evaluation measures - root mean square error(RMSE) and Pearson correlation between prediction and human judgement of query answer satisfaction.
The sentiment classification is evaluated using 10-fold cross-validation over the training set. The performance of the algorithm was tested under different feature settings.  
+
*Direct vs. Composite Comparison
- Multi-class classification - There are 51 hashtag-based and 16 smiley based labels. The evaluation metric is the average f-score for 10-fold cross validation. The f-score for the random baseline is 0.02. The result is shown in the following table.  
+
The following table compares the direct logistic regression approach with composite logistic regression.
  
[[File:Multi.png]]
+
[[File:Table3.png]]
  
The result is significantly better than the random baseline.
+
* Answer ranking for queries
 +
The following table compares the answer ranking with the method to that of the Google's ranking method.
  
- Binary classification
+
[[File:Table2.png]]
The labels are 1 if the sentence contains a particular label or 0 if the sentence does not bear any sentiment. For each of the 50 hashtag-based and 15 smiley-based labels, the binary classification is performed. The result is as shown in the following table.
 
  
[[File:Bin.png]]
+
== Observations ==
 +
* This work presents a novel task of predicting the satisfaction of a Web searcher using the answers discussed in Community-based Question Answer(CQA) sites.
 +
* The modularization of the task into 3 sub tasks is a main feature of the paper. I liked this approach of solving the problem. In this case, the task were query clarity, query-question match and answer satisfaction.
 +
* The results show that the composite approach performs better than the direct approach. This is due to the additional information provided by the human judgments in each of the subtasks. Also, the performance of the individual components can be improved individually.
 +
* The ranking of the CQA answers generated by the work in the paper outperforms the ranking generated by the Google web engine with respect to the ground truth.
 +
* I like the problem described in the paper. However, the approach could also incorporate state-of-the-art langugae modeling techniques as well as IR-techniques. It would be an interesting analysis to show the change in the performance due to them.
 +
* Second, since an exhaustive list of features are used, it would be interesting to asess the relative importance of the features towards web searcher satisfaction.
  
The results show that binary classification is better than the multi-class classification with a high precision value.
 
===Evaluation with human judges===
 
[https://www.mturk.com/mturk/welcome Amazon Mechanical Turk (AMT)] services was used to evaluate the performance of the classifier on test data. Te evaluation was considered correct if one of the tags selected by a human judge for a sentence was one of the 5 tags predicted by the algorithm. The correlation score for this task was <math>\kappa = 0.41</math>.
 
  
== Observations ==
+
== Study Plan ==
* This work presents a supervised classification framework for which utilizes Twitter hashtags and smileys as proxies for different sentiment types as labels. It contributes to avoiding the need for labor intensive manual annotation, allowing identification and classification of diverse sentiment types of short texts.
+
* The features used are inspired from
* Binary classification of sentiments yields better results than multi-class classification.
+
**[http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&ved=0CDgQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.14.4506%26rep%3Drep1%26type%3Dpdf&ei=pA-ZUKS1HuOV0QHhxYDQAQ&usg=AFQjCNHtxqkhPcph1fzf7cIyaIVgRzzv8Q&sig2=gU68pH20ELuw60PYnK3JMQ Predicting query performance]
* Punctuation, word and pattern features contributes more towards classification performance, as compared to a small marginal boost provided by the the n-gram features. Pattern features provides better performance as compared to the combined effect of the rest of the features.
+
** [http://dl.acm.org/citation.cfm?id=1390364 personalize or not to personalize: modeling queries with variation in user intent.]
* Preliminary exploration on inter-sentiment overlap and dependency by two simple techniques of tag occurrence and feature overlap.
+
**[http://www.aclweb.org/anthology-new/N/N10/N10-1055.pdf Query ambiguity revisited:clickthrough measures for distinguishing informational and ambiguous queries.]
* In addition, to the list of features used in the algorithm, features representing the short-term and long-term distance in the tweets could also be added.
 
* The evaluation could also be performed on blog data other than the tweets to validate the usage of  the semantic labels in other text documents.
 
  
== Study Plan ==
 
* [http://www.cse.huji.ac.il/~arir/sat.pdf Davidov and Rappoport (2008)] for understanding the automated pattern based approach for extracting sentiments.
 
**[http://dl.acm.org/citation.cfm?id=1220213 Davidov and Rappoport (2006)]
 
* [http://staff.science.uva.nl/~gilad/pubs/style2005-blogmoods.pdf Mishne (2005)]- study on mood classification
 
 
== Related Work ==
 
== Related Work ==
A similar work on extracting sentiment types on blogs was carried by [http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.116.5334 McDonal et al (2007)].
+
A similar work on retrieving content from social communities [http://www.mathcs.emory.edu/~eugene/papers/wsdm2008quality.pdf Finding high-quality content in social media.].

Latest revision as of 09:44, 6 November 2012

This a Paper reviewed for Social Media Analysis 10-802 in Fall 2012.

Citation

author    = {Qiaoling Liu and
              Eugene Agichtein and
              Gideon Dror and
              Evgeniy Gabrilovich and
              Yoelle Maarek and
              Dan Pelleg and
              Idan Szpektor},
 title     = {Predicting web searcher satisfaction with existing community-based
              answers},
 booktitle = {SIGIR},
 year      = {2011},
 pages     = {415-424},
 ee        = {http://doi.acm.org/10.1145/2009916.2009974},
 crossref  = {DBLP:conf/sigir/2011},
 bibsource = {DBLP, http://dblp.uni-trier.de}

Online Version

Predicting web searcher satisfaction with existing community-based answers

Summary

The paper proposes a solution to a novel problem of predicting and validating the usefulness of Community-based Question Answering (CQA) sites for an external web searcher rather than an asker belonging to a community. The work has looked at three major components in the pipeline of solving the query satisfactionof users. They are as follows -

1. query clarity task - Whether a query is unambiguous enough to be interpreted as a question.

2. query-question match task - Measures the similarity between a query and a question.

3. answer quality - Assessing the sanctification of the answer with respect to the question in CQA, and thus indirectly relates to the satisfaction of the query.

The paper approaches the problem by building a regression model. The evaluation is performed by using human labeled data collected using crowdsourcing.

Methodoloy

Features

The features used for building the regression model has been divided according to the subtasks as mentioned above.

  • Query clarity features (9 total)
    • # of characters in the query.
    • # of words in the query.
    • # of clicks following the query.
    • Overall click entropy of the query.
    • User click entropy of the quer.
    • Query clarity score.
    • WH-type of the query - what,why,when,where,which,how,is,are,do.
  • Query-question match features(23 total)
    • Match score between te query and question title/body/answers using similarity metrics.
    • Jaccard/Dice/Tanimoto coefficient between the query and the question title.
    • Ratio between the number of characters/words in to the query to that in the question structure.
    • # of clicks on the question following this query.
    • # of users who clicked the question following thi/any query.
  • Answer quality features (37 total)
    • # of characters/words in the answer.
    • # of unique words in the answer.
    • # of answers received by the asker in the past.

For a full list of features please refer to the paper.

Models

  • Direct Logistic Regression

All the features listed above are using Logistic regression as the regressor model.

  • Composite Logisic Regression

A separate model is trained for each of he three subtasks mentioned above. The features pertaining to the subtask are used to train three separate regression models. The models are then combined into one by using Logistic Regression which assigns weights on the individual model.

Model.jpg

Evaluation

The evaluation is performed on Click Dataset on Google search leading to Yahoo! Answers.Evaluation measures - root mean square error(RMSE) and Pearson correlation between prediction and human judgement of query answer satisfaction.

  • Direct vs. Composite Comparison

The following table compares the direct logistic regression approach with composite logistic regression.

Table3.png

  • Answer ranking for queries

The following table compares the answer ranking with the method to that of the Google's ranking method.

Table2.png

Observations

  • This work presents a novel task of predicting the satisfaction of a Web searcher using the answers discussed in Community-based Question Answer(CQA) sites.
  • The modularization of the task into 3 sub tasks is a main feature of the paper. I liked this approach of solving the problem. In this case, the task were query clarity, query-question match and answer satisfaction.
  • The results show that the composite approach performs better than the direct approach. This is due to the additional information provided by the human judgments in each of the subtasks. Also, the performance of the individual components can be improved individually.
  • The ranking of the CQA answers generated by the work in the paper outperforms the ranking generated by the Google web engine with respect to the ground truth.
  • I like the problem described in the paper. However, the approach could also incorporate state-of-the-art langugae modeling techniques as well as IR-techniques. It would be an interesting analysis to show the change in the performance due to them.
  • Second, since an exhaustive list of features are used, it would be interesting to asess the relative importance of the features towards web searcher satisfaction.


Study Plan

Related Work

A similar work on retrieving content from social communities Finding high-quality content in social media..