Sentiment Analysis in Multiple Domains

From Cohen Courses
Revision as of 21:42, 7 October 2012 by Zeyuz (talk | contribs) (Created page with '== Team members == * Zeyu Zheng * Mahaveer Jain == Project Title == Sentiment Analysis in Multiple Domains == Project Abstract == == Data…')
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

Team members

Project Title

Sentiment Analysis in Multiple Domains

Project Abstract

Data

We will use the benchmark dataset of Amazon review collected by Blitzer et al. (2007). This dataset gathered more than 340,000 reviews from 22 different product types, which can be regarded as different domains.


Task

Given the labeled reviews of some source product types and unlabeled reviews from the target product types, we want to classifier reviews of target product types into positive or negative class.


Baseline

Firstly, the most naïve approach for this task is simply merging all examples in the multiple source product types, and leverage the algorithm proposed in [11] to automatically adding target domain unlabeled data in a bootstrapping way. We refer this algorithm as “All-data (AD)” hereafter. Then, we preformed the semi-supervised multiple classifier system (MCS) [19]. Finally, in order to examine the effectiveness of the Contrast Classifier, we performed our framework without filtering out not informative examples at beginning, and this algorithm would be referred as “No-CC”.

Challenges

  • We will need to deal with large data (original dataset contains more than 5.8 million reviews).
  • We may need to deal with features for each objects (such as product's price), in addition to the relational data.
  • We may need to deal with multi-relational data (such as reviewer-reviewer trust network), if data is available, though we have not found such data for now.


What we hope to learn

  • We would like to learn how each dimension actually contributes to the performance in a specific task.