Dave et. al., WWW 2003
This is a summary of research paper as part of Social Media Analysis 10-802, Fall 2012.
Contents
Citation
Dave, K., Lawrence, S., and Pennock, D.M. 2003. Mining the peanut gallery: Opinion extraction and semantic classification of product reviews. WWW 2003.
Online Version
Abstract from the paper
The web contains a wealth of product reviews, but sifting through them is a daunting task. Ideally, an opinion mining tool would process a set of search results for a given item, generating a list of product attributes (quality, features, etc.) and aggregating opinions about each of them (poor, mixed, good). We begin by identifying the unique properties of this problem and develop a method for automatically distinguishing between positive and negative reviews. Our classifier draws on information retrieval techniques for feature extraction and scoring, and the results for various metrics and heuristics vary depending on the testing situation. The best methods work as well as or better than traditional machine learning. When operating on individual sentences collected from web searches, performance is limited due to noise and ambiguity. But in the context of a complete web-based tool and aided by a simple method for grouping sentences into attributes, the results are qualitatively quite useful.
Summary
Overview
This paper proposes some techniques for opinion mining and classification of opinions as positive or negative. It discusses various contemporary methods used for sentiment classification and how they cater to different tasks.
Proposed Techniques
The system trains a classifier using self-tagged product reviews from websites such as amazon.com and c|net.com. On c|net.com, for each review, a user can give a "thumbs up" or a "thumbs down" for positive or negative review respectively. Similarly on amazon.com a customer can give a scalar ratings to a review using number of stars from one to five - one star being the lowest and five stars being the highest rating for a review.
Feature Selection
For selecting features, it proposes substituting certain words like numbers, product names, product type-specific words and low frequency words to some common tokens to generalize the features. It also discusses adding features based on the Wordnet's synset for a given part-of-speech in the sentence. It mentions that using synsets leads to explosion in the size of the feature sets and also causes more noise than signal. It also proposes using colocation features especially to capture noun-adjective relationships. It also tries to use stemming and negation for handling language variations. Once the substitutions are done, different features are obtained for n-grams. It experiments with bigrams and trigrams features and also using lower order n-grams for smoothing.
Sentiment Classification
Mining
Evaluation
They discuss results from the two tests carried out.
Discussion
Related Papers
Study Plan
Resources useful for understanding this paper
- Article: Opinion Mining