Difference between revisions of "Dave et. al., WWW 2003"

From Cohen Courses
Jump to navigationJump to search
Line 18: Line 18:
  
 
=== Proposed Techniques ===
 
=== Proposed Techniques ===
The system trains a [[UsesMethod::Classification|classifier]] using self-tagged product reviews from websites such as [http://www.amazon.com amazon.com] and [http://www.cnet.com c|net.com]. On [http://www.cnet.com c|net.com], for each review, a user can give a "''thumbs up''" or a "''thumbs down''" for positive or negative review respectively. Similarly on [http://www.amazon.com amazon.com] a customer can give a scalar ratings to a review using number of stars from  one to five - one star being the lowest and five stars being the highest rating for a review.
+
The system uses various approaches to obtain [[UsesMethod::Feature selection|features]] from the given documents and scoring the features. They also experiment with training various machine learning [[UsesMethod::Classification|classifiers]] using self-tagged product reviews from websites such as [http://www.amazon.com amazon.com] and [http://www.cnet.com c|net.com]. On [http://www.cnet.com c|net.com], for each review, a user can give a "''thumbs up''" or a "''thumbs down''" for positive or negative review respectively. Similarly on [http://www.amazon.com amazon.com] a customer can give a scalar ratings to a review using number of stars from  one to five - one star being the lowest and five stars being the highest rating for a review.
  
 
==== Feature Selection ====
 
==== Feature Selection ====
Line 34: Line 34:
  
 
== Evaluation ==
 
== Evaluation ==
They discuss results from the two tests carried out - Test 1 and Test 2. Test 1 tests on each of the 7 C|net product categories while using the other 6 as training set. Test 2 uses randomly selected sets of positive and negative reviews from the 4 largest C|net product categories for evaluation. They used one set for testing and the remaining sets for training.<br>
+
They discuss results from the two tests carried out - Test 1 and Test 2. Test 1 tests on each of the 7 C|net product categories while using the other 6 as training set. Test 2 uses randomly selected sets of positive and negative reviews from the 4 largest C|net product categories for evaluation. They used one set for testing and the remaining sets for training. They use the product reviews obtained from {{UsesDataset::Amazon product reviews dataset|amazon.com]] and [[UsesDataset::C|net product reviews dataset|C|net.com]]<br>
 
They present the comparison of classification accuracy for different approaches proposed.
 
They present the comparison of classification accuracy for different approaches proposed.
 
* The use of WordNet, stemming, colocation, negation did not help in improving the results as compared with the Unigram baseline.
 
* The use of WordNet, stemming, colocation, negation did not help in improving the results as compared with the Unigram baseline.

Revision as of 22:14, 1 October 2012

This is a summary of research paper as part of Social Media Analysis 10-802, Fall 2012.

Citation

Dave, K., Lawrence, S., and Pennock, D.M. 2003. Mining the peanut gallery: Opinion extraction and semantic classification of product reviews. WWW 2003.

Online Version

Direct PDF link

Abstract from the paper

The web contains a wealth of product reviews, but sifting through them is a daunting task. Ideally, an opinion mining tool would process a set of search results for a given item, generating a list of product attributes (quality, features, etc.) and aggregating opinions about each of them (poor, mixed, good). We begin by identifying the unique properties of this problem and develop a method for automatically distinguishing between positive and negative reviews. Our classifier draws on information retrieval techniques for feature extraction and scoring, and the results for various metrics and heuristics vary depending on the testing situation. The best methods work as well as or better than traditional machine learning. When operating on individual sentences collected from web searches, performance is limited due to noise and ambiguity. But in the context of a complete web-based tool and aided by a simple method for grouping sentences into attributes, the results are qualitatively quite useful.

Summary

Overview

This paper proposes some techniques for opinion mining and classification of opinions as positive or negative. It discusses various contemporary methods used for sentiment classification and how they cater to different tasks.

Proposed Techniques

The system uses various approaches to obtain features from the given documents and scoring the features. They also experiment with training various machine learning classifiers using self-tagged product reviews from websites such as amazon.com and c|net.com. On c|net.com, for each review, a user can give a "thumbs up" or a "thumbs down" for positive or negative review respectively. Similarly on amazon.com a customer can give a scalar ratings to a review using number of stars from one to five - one star being the lowest and five stars being the highest rating for a review.

Feature Selection

For selecting features, it proposes substituting certain words like numbers, product names, product type-specific words and low frequency words to some common tokens to generalize the features. It also discusses adding features based on the Wordnet's synset for a given part-of-speech in the sentence. It mentions that using synsets leads to explosion in the size of the feature sets and also causes more noise than signal. It also proposes using colocation features especially to capture noun-adjective relationships. It also tries to use stemming and negation for handling language variations.
Once the substitutions are done, different features are obtained for n-grams. It experiments with bigrams and trigrams features and also using lower order n-grams for smoothing. More features were obtained from substrings using the Church's suffix array algorithm.
Various thresholds such as frequency counts and smoothing are used to restrict the number of features to ease computation and relevance of remaining features.

Feature Scoring

Baseline method for scoring features is
, where C and C' are the sets of positive and negative reviews respectively.
Dave et al. also tried other scoring methods using information gain, odds ratios, Jaccard's measure of similarity, but they didn't show significant improvements over the baseline. Different weighting schemes were also experimented like log transform, Gaussian weighing scheme, residual inverse document frequency etc to see how it affects the classification.

Sentiment Classification

A document was classified as positive or negative review based on the sum of the scores of the features present in it - positive review if the sum is positive and negative if the sum is negative. The authors also experimented with Naive Bayes classifier, SVM classifier, Maximum entropy classifier and EM classifier using Rainbow text classification package to compare the results with their approaches. They also crawl search engine results for a given product name to obtain more reviews and analyze them.

Evaluation

They discuss results from the two tests carried out - Test 1 and Test 2. Test 1 tests on each of the 7 C|net product categories while using the other 6 as training set. Test 2 uses randomly selected sets of positive and negative reviews from the 4 largest C|net product categories for evaluation. They used one set for testing and the remaining sets for training. They use the product reviews obtained from {{UsesDataset::Amazon product reviews dataset|amazon.com]] and net product reviews dataset
They present the comparison of classification accuracy for different approaches proposed.

  • The use of WordNet, stemming, colocation, negation did not help in improving the results as compared with the Unigram baseline.
  • Trigrams model performed the best and then the bigram model. The use of lower order n-grams for smoothing didn't improve the results.
  • The bigram baseline model outperformed the SVM, EM, Maximum entropy and Naive Bayes classifiers as well. Naive Bayes classifier with Laplace smoothing showed better performance among the other classifiers which were experimented with.


Discussion

Related Papers

Study Plan

Resources useful for understanding this paper