Comparison Andreevskaia et al ICWSM 2007 and MHurst KNigam RetrievingTopicalSentimentsFromOnlineDocumentColeections

From Cohen Courses
Revision as of 08:16, 6 November 2012 by Srawat (talk | contribs) (→‎Big Idea)
Jump to navigationJump to search

Papers

  1. All Blogs are Not Made Equal: Exploring Genre Differences in Sentiment Tagging of Blogs, Alina Andreevskaia, Sabine Bergler, and Monica Urseanu, ICWSM 2007
  2. Hurst, Matthew F., and Kamal Nigam. "Retrieving topical sentiments from online document collections." Proceedings of SPIE. Vol. 5296. 2004.

Problem

Andreevskaia_2007 perform sentiment classification(binary and ternary) on a per sentence basis. For their analysis they study the differences between "personal diary" and "journalistic" styled web blogs using a manually annotated data. They evaluate their performance on two systems, a sentiment word counts based system and an improved version using valence shifters.

Hurst_Nigam_2004 had previously performed a similar task of identifying polarity on a per sentence basis to discover polar sentences about a topic. Hurst and Nigam had used a linear classifier ([Winnow_Algorithm]) for topic classification and a rule based grammatical model for polarity identification.

Big Idea

Both the papers try to perform sentiment or polarity classification on a per sentence basis rather than at a document or message level. This is sometimes beneficial for a fine grained identifying of sentiments pertaining to a specific entity or topic. Both the approaches use a more rule based approach by using sentiment word lists for identifying sentiments. While Hurst_et_al use a restricted sentiment word list pertaining to a single topic, Andreevskaia used a much bigger HM word list further expanded using WordNet. Similarly where Hurst_et_al a grammatical approach to assign polarity to topics, Andreevskaia_et_al restricts to sentiment word counts for assigning sentiment labels only.

Method

Dataset Used

Andreevskaia_et_al tested their system on two datasets

Each dataset contained 600 sentences each, each of which was manually annotated with 200 positive, 200 Negative and 200 Neutral Sentences.

Hurst_Nigam_et_al used a dataset containing 16, 616 sentences from 982 messages extracted from online resources(usenet, online message boards, etc.) about certain domains. Manually annotated 250 Randomly selected sentences with following labels

  • Polarity Identification: positive, negative
  • Topic Identification: Topical, Out-of-Topic
  • Polarity and Topic Identification: positive-correlated, negative-correlated, positive-uncorrelated, negative-uncorrelated.

Other Discussions

Other Questions