Predicting web searcher satisfaction with existing community-based answers
This a Paper reviewed for Social Media Analysis 10-802 in Fall 2012.
Contents
Citation
author = {Qiaoling Liu and Eugene Agichtein and Gideon Dror and Evgeniy Gabrilovich and Yoelle Maarek and Dan Pelleg and Idan Szpektor}, title = {Predicting web searcher satisfaction with existing community-based answers}, booktitle = {SIGIR}, year = {2011}, pages = {415-424}, ee = {http://doi.acm.org/10.1145/2009916.2009974}, crossref = {DBLP:conf/sigir/2011}, bibsource = {DBLP, http://dblp.uni-trier.de}
Online Version
Predicting web searcher satisfaction with existing community-based answers
Summary
The paper proposes a solution to a novel problem of predicting and validating the usefulness of Community-based Question Answering (CQA) sites for an external web searcher rather than an asker belonging to a community. The work has looked at three major components in the pipeline of solving the satisfaction of users. They are as follows -
1. query clarity task - Whether a query is unambiguous enough to be interpreted as a question.
2. query-question match task - Measures the similarity between a query and a question.
3. answer quality - Assessing the sanctification of the answer with respect to the question in CQA, and thus indirectly relates to the satisfaction of the query.
Methodoloy
- Features - The features employed in classifying the sentiments can be broadly divided into 4 distinct types.
- Single-word features - They are considered as binary features with weight equal to the inverted count in the corpus.
- n-gram features - The 2-5 length of consecutive words are considered as binary features with the same weighting as for the single-word features.
- Pattern-based features - The words are classified as High frequency words(HFW) and content words(CW) .
- A pattern is defined as an ordered sequence of HFW and slots for CW based on the frequency threshold.
- In this paper, a pattern contains 2-6 HFWs and 1-5 slots for CWs. It is based on the works of
- The weight for the pattern is assigned to the degree of match of the pattern - Exact match, Sparse match, Incomplete match and No match.
- Punctuation features - The weight assigned is the average weight of the respective features.
- Length of a sentence.
- Number of exclamations
- Number of question marks
- Number of quotes
- Number of capital words.
- Classification Algorithm
The algorithm used to assign the sentiment label to test examples is a slight modification of the k-NN algorithm.
Evaluation
Evaluation using cross-validation
The sentiment classification is evaluated using 10-fold cross-validation over the training set. The performance of the algorithm was tested under different feature settings. - Multi-class classification - There are 51 hashtag-based and 16 smiley based labels. The evaluation metric is the average f-score for 10-fold cross validation. The f-score for the random baseline is 0.02. The result is shown in the following table.
The result is significantly better than the random baseline.
- Binary classification The labels are 1 if the sentence contains a particular label or 0 if the sentence does not bear any sentiment. For each of the 50 hashtag-based and 15 smiley-based labels, the binary classification is performed. The result is as shown in the following table.
The results show that binary classification is better than the multi-class classification with a high precision value.
Evaluation with human judges
Amazon Mechanical Turk (AMT) services was used to evaluate the performance of the classifier on test data. Te evaluation was considered correct if one of the tags selected by a human judge for a sentence was one of the 5 tags predicted by the algorithm. The correlation score for this task was .
Observations
- This work presents a supervised classification framework for which utilizes Twitter hashtags and smileys as proxies for different sentiment types as labels. It contributes to avoiding the need for labor intensive manual annotation, allowing identification and classification of diverse sentiment types of short texts.
- Binary classification of sentiments yields better results than multi-class classification.
- Punctuation, word and pattern features contributes more towards classification performance, as compared to a small marginal boost provided by the the n-gram features. Pattern features provides better performance as compared to the combined effect of the rest of the features.
- Preliminary exploration on inter-sentiment overlap and dependency by two simple techniques of tag occurrence and feature overlap.
- In addition, to the list of features used in the algorithm, features representing the short-term and long-term distance in the tweets could also be added.
- The evaluation could also be performed on blog data other than the tweets to validate the usage of the semantic labels in other text documents.
Study Plan
- Davidov and Rappoport (2008) for understanding the automated pattern based approach for extracting sentiments.
- Mishne (2005)- study on mood classification
Related Work
A similar work on extracting sentiment types on blogs was carried by McDonal et al (2007).