Yue Lu, WWW 2010
Contents
Citation
author = {Lu, Yue and Tsaparas, Panayiotis and Ntoulas, Alexandros and Polanyi, Livia}, title = {Exploiting social context for review quality prediction}, booktitle = {Proceedings of the 19th international conference on World wide web}, series = {WWW '10}, year = {2010}, isbn = {978-1-60558-799-8}, pages = {691--700}, numpages = {10},
Online version
http://sifaka.cs.uiuc.edu/~yuelu2/pub/www10-reviewQuality.pdf
Abstract from the paper
Online reviews in which users publish detailed commentary about their experiences and opinions with products, services, or events are extremely valuable to users who rely on them to make informed decisions. However, reviews vary greatly in quality and are constantly increasing in number, therefore, automatic assessment of review helpfulness is of growing importance. Previous work has addressed the problem by treating a review as a stand-alone document, extracting features from the review text, and learning a function based on these features for predicting the review quality. In this work, we exploit contextual information about authors’ identities and social networks for improving review quality prediction. We propose a generic framework for incorporating social context information by adding regularization constraints to the text-based predictor. Our approach can effectively use the social context information available for large quantities of unlabeled reviews. It also has the advantage that the resulting predictor is usable even when social context is unavailable. We validate our framework within a real commerce portal and experimentally demonstrate that using social context information can help improve the accuracy of review quality prediction especially when the available training data is sparse.
Summary
Automatic review quality prediction can be very useful to sift through spam and bogus reviews in sites like Yelp.com or Amazon.com. Most automatic review quality predictors make use of the review text to predict the review quality. In this paper, the authors describe a method of incorporating social context information in a text-based review quality predictor.
First, the authors give a description of their text-based quality prediction system. This system makes use of the following types features extracted from the text:
- Text-statistics features that are based on aggregate statistics over the text, such as length of the review, average length of sentences, or the richness of the vocabulary.
- Syntactic Features which take into account various statistics related to Part-Of-Speech (POS) of the words in the text such as percentage of nouns, adjectives, punctuations, etc.
- Conformity features that are used in measuring how much a review conforms to the average by looking at the KL-divergence between the unigram language model of a review and the unigram model of an “average” review that contains the text of all reviews for that item i.
- Sentiment features that take into account the presence of positive and negative sentiment of words in the review.
Next the authors describe two different ways of incorporating social context information into the review quality predictor. The first straightforward method is to just extract additional features that are based on the social context information and use them in the quality prediction function. These features include ReviewNum (the engagement of the author), AvgRating (the historical quality of the reviewer) and the In/Out-Degree or PageRank. The problem with this approach is that this information may not be available for all reviews. Also increasing the dimension of the feature vector increases the amount of labelled data that would be needed to learn a good predictor function.
The other way to incorporate social context information into the predictor function is to extract constraints from the social context and use them on the text-based predictor. The authors describe a number of hypotheses that they use in designing regularizing constraints to add into their text-based linear regression system:
- Author Consistency Hypothesis: Reviews from the same author will be of similar quality.
- Trust Consistency Hypothesis: A link from one user to another is an explicit or implicit statement of trust and reviewers trust other reviewers only if the quality of their reviews are at least as high as that of his/her own.
- Co-Citation Consistency Hypothesis: If two reviewers are trusted by the same third reviewer, then their quality should be comparable.
- Link Consistency Hypothesis: If two people are connected in the social network (not necessarily direct links), then their quality should be similar.
Results:
The authors used data from Ciao UK, a community review web site where people write reviews for various products and also rate the reviews written by others. In addition, people can add members to their network of trusted members. From their experiments, the authors found that:
- Adding social context features prove effective only when there is enough training data.
- Regularization methods (using the constraints based on social context information) work best when there is little labelled training data and the large amount of unlabeled data.
- These regularization techniques provide improvements even when applied to data without any social context.
Related Papers
- O. Tsur and A. Rappoport. Revrank: a fully unsupervised algorithm for selecting the most helpful book reviews. In ICWSM, 2009.