Comparison Andreevskaia et al ICWSM 2007 and MHurst KNigam RetrievingTopicalSentimentsFromOnlineDocumentColeections

From Cohen Courses
Jump to navigationJump to search

Papers

  1. All Blogs are Not Made Equal: Exploring Genre Differences in Sentiment Tagging of Blogs, Alina Andreevskaia, Sabine Bergler, and Monica Urseanu, ICWSM 2007
  2. Hurst, Matthew F., and Kamal Nigam. "Retrieving topical sentiments from online document collections." Proceedings of SPIE. Vol. 5296. 2004.

Problem

Andreevskaia_2007 perform sentiment classification(binary and ternary) on a per sentence basis. For their analysis they study the differences between "personal diary" and "journalistic" styled web blogs using a manually annotated data. They evaluate their performance on two systems, a sentiment word counts based system and an improved version using valence shifters.

Hurst_Nigam_2004 had previously performed a similar task of identifying polarity on a per sentence basis to discover polar sentences about a topic. Hurst and Nigam had used a linear classifier ([Winnow_Algorithm]) for topic classification and a rule based grammatical model for polarity identification.

Big Idea

Both the papers try to perform sentiment or polarity classification on a per sentence basis rather than at a document or message level. This is sometimes beneficial for a fine grained identifying of sentiments pertaining to a specific entity or topic. Both the approaches use a more rule based approach by using sentiment word lists for identifying sentiments. While Hurst_et_al use a restricted sentiment word list pertaining to a single topic, Andreevskaia used a much bigger HM word list further expanded using WordNet. Similarly where Hurst_et_al a grammatical approach to assign polarity to topics, Andreevskaia_et_al restricts to sentiment word counts for assigning sentiment labels only.

Method

Dataset Used

Andreevskaia_et_al tested their system on two datasets

Each dataset contained 600 sentences each, each of which was manually annotated with 200 positive, 200 Negative and 200 Neutral Sentences.

Hurst_Nigam_et_al used a dataset containing 16, 616 sentences from 982 messages extracted from online resources(usenet, online message boards, etc.) about certain domains. Manually annotated 250 Randomly selected sentences with following labels

  • Polarity Identification: positive, negative
  • Topic Identification: Topical, Out-of-Topic
  • Polarity and Topic Identification: positive-correlated, negative-correlated, positive-uncorrelated, negative-uncorrelated.

Other Discussions

Other Questions

Other Questions

  1. How much time did you spend reading the (new, non-wikified) paper you summarized? 2 hours 30 Mins
  2. How much time did you spend reading the old wikified paper? 1 hour
  3. How much time did you spend reading the summary of the old paper? 20 min
  4. How much time did you spend reading background material? 1 hour
  5. Was there a study plan for the old paper? Yes
    1. If so, did you read any of the items suggested by the study plan? and how much time did you spend with reading them? The wikified paper didn't require much background knowledge. Furthermore it pointed to the "Wordnet" which is well known and the Hurst_Nigam paper referred to above which was anyways being reviewed. As such no further reading was required.
  6. Give us any additional feedback you might have about this assignment.

I think this is a nice way to ensure that previously wikified papers are reviewed by other people while also helping the reader to write new summaries on related papers with similar concepts much faster.