Comparative Study : Sentiment Analysis using Automated pattern based appraoch VS Single structured model

From Cohen Courses
Revision as of 05:47, 6 November 2012 by Ydalal (talk | contribs) (→‎Dataset)
Jump to navigationJump to search

Papers Compared

  1. Enhanced sentiment learning using Twitter hashtags and smileys Davidov ...
  2. Structured Models for Fine-to-Coarse Sentiment Analysis Ryan ...

Comparison

Both the paper solve same problem "sentiment classification". But the differences are more then similarities.

Problem

Davidov and team, are trying to solve the sentiment classification problem by leveraging a ready to use corpus "twitter", they use the twitter hashtags and smileys to train the KNN model. Hence this process doesn't require any manual labeling of training data. They have also used an interesting feature called "automatic patterns" which is language independent and provides most significant improvement over rest of the features. This approach is limited to document level ( considering a tweet is a document).

Ryan and team has approached the sentiment classification from an different perspective altogether. They don't leverage any dataset rather they propose a new modeling approach to improve sentiment classification accuracy at different granular levels.

Dataset

Ryan and team uses customer reviews dataset where as Davidov and team uses twitter dataset.

  • Customer reviews differ from tweets as they don't contain hashtags and very less smileys. We can say that review dataset can be evaluated with Davidov's model using the same set of features. But it would be limited to document level.
  • Vice versa Ryan and team's model can be used to evaluate the twitter corpus. As the model is capable of taking multiple classes into consideration.

Discussion

Additional Questions

  1. How much time did you spend reading the (new, non-wikified) paper you summarized?
    • 2.5 hours
  2. How much time did you spend reading the old wikified paper?
    • 1 hour
  3. How much time did you spend reading the summary of the old paper?
    • 15 minutes
  4. How much time did you spend reading background materiel?
    • None
  5. Was there a study plan for the old paper?
    • Yes
    1. if so, did you read any of the items suggested by the study plan? and how much time did you spend with reading them?
      • Yes, I glanced over 2/3 papers to understand the key concepts. It was a good starting point.
      • 45 minutes
  6. Give us any additional feedback you might have about this assignment.
    1. The wikified paper's summary was quite useful to start with as it helped in understanding the big picture immediately and noting down the key areas to look for in the paper.
      • For example the binary classification was not immediately clear from summary, evaluation with human judges was a new thing i encountered when i read the paper. I had additional doubt on overlapping hashtags and labels that was explained in paper.
    2. Some additional key features that I had to look for in paper: KNN distance function, Neighbor selection criteria, Feature selection process.
    3. I think its useful to have a good summary and its unavoidable to ignore too much details in summary. But In the current wikified summary some important features were missing and a good discussion on pros and cons of the approach were missing.