Difference between revisions of "Higher Order Review Rating Sentiment Analysis"

Latest revision as of 05:04, 7 June 2014

Comments

Nice proposal with well thought tasks and challenges.
On Amazon users can specify whether they found some review useful. This can be another interesting dimension for your studies. Is this data available in the dataset you have?
It will be great if you can add a related work section which has

the paper title, one line summary of how is it related and link to PDF for each paper.

You might find following paper interesting :

Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classiﬁcation [1]

--Bbd 01:08, 11 October 2012 (UTC)

Thank you very much for your comment.

About the "review" about review, that is, the data users specify whether they found the review useful, it is surely interesting dimension. We found similar data, "number of helpful feedbacks" for each review. We are thinking to exploit this dimension. Thank you very much for your advice!
About the related paper you suggested, we found domain adaptation is also an interesting topic, which we might be able to cover.
About the related work, we added one paper.

--Nnori

Team members

Vagelis Papalexakis

Project Title

Higher Order Review Rating / Sentiment Analysis

Project Abstract

Given a review, there are many different dimensions that one may exploit, in order to improve performance in review rating/sentiment analysis. For example, when was that review written? Is it referring to the book or the movie of a specific title (e.g. are we talking about Harry Potter the book, or the movie?).

We propose to exploit the intrinsic high dimensionality of a review, in order to improve performance. Namely, we propose to model such reviews as high dimensional tensors (possibly with more than 3 dimensions/modes) and use tensor decomposition algorithms in order to obtain higher accuracy.

Intuitively, by incorporating all available dimensions, we should be able to do at least as good as if we were using only a subset of those, provided that the extra information that we add is useful and not particularly noisy.

We wil show that by exploiting these high dimensional data, we can achieve higher performance in the review sentiment classification task, than when we do not exploit them.

Data

We will use Amazon Product Review Data.

The data size is more than 5.8 million reviews.

This data is used in (Jindal and Liu, WWW-2007, WSDM-2008; Lim et al, CIKM-2010; Jindal, Liu and Lim, CIKM-2010; Mukherjee et al. WWW-2011; Mukherjee, Liu and Glance, WWW-2012.

From this dataset, we can extract various dimensions such as reviewer, product id, product category, date, review text.

Task

The task is classification of reviews (from Amazon) as positive or negative.

We have only ratings (from 1 to 5) for reviews, and we do not have explicit positive/negative labels. Since the data is so large, we do not manually create labels for reviews. Instead, we will count ratings 1 and 2 as negative, and ratings 4 and 5 as positive.

Baseline

As a baseline, we propose to "ignore" the inherent high dimensionality of the data and instead use matrix approaches (which are two dimensional). For the example that we mentioned earlier, in the case of a review, we may only take into account the terms that appear in that review but ignore the date or the product category. Examples of these baselines could be the Singular Value Decomposition or some other Matrix Factorization methods, like the Non-negative Matrix Factorization (NMF), which is particularly popular in many data mining applications.

Evaluation

As performance evaluation, we may consider both quantitative and qualitative approaches.

Quantitative

We may be able to classify the reviews using, e.g. SVM/k-NN classifier and argue that our approach yields better classification accuracy than the chosen baselines.
We may also conduct analysis on how each dimension contributes to the performance. For example, we may compare a situation where we ignore "time" information, and a situation where we ignore "reviewers".

Qualitative

We may considering visualizing the review in a lower (possibly 2-dimensional) space and argue that the visualization quality achieved by our approach (taking into account more different views on the data) succeeds in differentiating e.g. good from bad reviews (visually).

Key technical challenges

We will need to deal with large data (original dataset contains more than 5.8 million reviews).
We may need to deal with features for each objects (such as product's price), in addition to the relational data.
We may need to deal with multi-relational data (such as reviewer-reviewer trust network), if data is available, though we have not found such data for now.

What we hope to learn

We would like to learn how each dimension actually contributes to the performance in a specific task.

Related Work

Incorporating reviewer and product information for review rating prediction
- This is one of a few research that investigated the high dimensionality of review data using tensor. It dealt with reviewers, review contents and product information. Analyses on this paper will be helpful for our project to design how to examine the dependency of multiple dimensions.

@@ Line 23: / Line 23: @@
 * [[User:Epapalex|Vagelis Papalexakis]]
-* [[User:Nnori|Nozomi Nori]] (dropped)
 == Project Title ==

Difference between revisions of "Higher Order Review Rating Sentiment Analysis"

Latest revision as of 05:04, 7 June 2014

Contents

Comments

Team members

Project Title

Project Abstract

Data

Task

Baseline

Evaluation

Key technical challenges

What we hope to learn

Related Work

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools