Product Feature Extraction and Sentiment Analysis in Product Reviews

From Cohen Courses
Revision as of 01:04, 8 October 2012 by Sushantk (talk | contribs)
Jump to navigationJump to search

Team Members

Project Title

Product Feature Extraction and Sentiment Analysis in Product Reviews

Project Abstract

In this project, we plan to work on product reviews of various product classes and analyze them for finding the product features and opinion of various customers about those features. Using this analysis we aim to identify feature-wise good and bad aspects of a given product. This can be a useful practical solution to allow customers to help decide how well a product satisfies his/her needs if they are only looking for few important features in a product and don't care about other features.
Some of the product classes do not have well-defined features like movies, books etc. For such classes, we need to identify implicit features based on what customers liked/disliked about it, which could be something like the movie plot, specific actor's performance etc. Also, we aim to analyze the performance of different feature extraction techniques on different product classes. Opinion mining for specific product features would require some level of semantic understanding to separate opinions about other features mentioned in the same review. Also same word can be used to express contrasting opinions, which must be taken into account to avoid incorrect sentiment classification if only a global polarity is used for each sentiment word.
Based on these challenges, we aim to achieve a robust solution for extracting features and opinion about them from the product reviews.

Datasets

Amazon product reviews data set -

  1. http://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html#datasets
  2. Amazon product review dataset for various classes

Apart from that, we have already built our own web crawlers and extracted some more product reviews from www.amazon.com to train our system.

Baseline

We suggest the following baselines -

  • Product Feature Extraction: We can use an n-gram model to extract noun phrases/words which are usually candidate features for a product. We can also add Wordnet sysnset data to expand on the list of candidate features and also put an appropriate frequency threshold to discard unimportant features.
  • Sentiment Analysis: We can use a list of sentiment words already marked as positive and negative and then score each sentence as positive or negative or neutral based on presence of these words in that sentence.

Related Papers