Andreevskaia et al., ICWSM 2007
This a Paper discussed in Social Media Analysis 10-802 in Spring 2010.
Citation
All Blogs are Not Made Equal: Exploring Genre Differences in Sentiment Tagging of Blogs, Alina Andreevskaia, Sabine Bergler, and Monica Urseanu, ICWSM 2007
Online version
All Blogs Are Not Made Equal: Exploring Genre Differences in Sentiment Tagging of Blogs
Summary
This paper explores genre differences in web blogs in regards to sentiment classification. They examined "personal diary" styled blogs and "journalistic" styled blogs, and looked to measure differences in sentiment classifier performance over the two genres, for binary sentiment tasks (positive vs. negative) as well as ternary sentiment tasks (positive vs. negative vs. neutral).
The authors measured both human computation performance on the task, as well as automated keyword-count-based classification methods, using performance by humans as a baseline for comparison. The automated method used sentiment word counts and sentiment scores of those words, with a more advanced system also taking into account valence shifting words. The authors used the HM word list taken from Hatzivassiloglou and McKeown ACL 1997, and expanded the list using WordNet, to form a more comprehensive sentiment word list.
They tested their system on two datasets, one representing journalistic styled blogs (a self-gathered cyberjournalist.net dataset) and one representing personal journal styled blogs (the 20060501.xml dataset provided by the conference organizers). They classified sentiment at the individual sentence level, with each dataset containing 600 sentences, balanced to have 200 sentences of each kind (positive, negative and neutral).
Results
The main results are as follows:
- Humans inter-annotator agreement on sentiment classification is high for binary sentiment classification (95~99%), but significantly lower for ternary sentiment classification (80%).
- For human agents, there is no significant difference in classifying sentiment for the two genres of blog documents, but the ternary task is more difficult than the binary task.
- The automated methods had binary classification accuracies of (Diary: 73%, Journal: 64%) (no valence shifters) and (Diary: 77%, Journal: 67%) (valence shifting words taken into account).
- The automated methods had ternary classification accuracies of (Diary: 51%, Journal: 48%) (no valence shifters) and (Diary: 53%, Journal: 50%) (valence shifting words taken into account).
- Therefore, the automated classification methods were sensitive to genre differences, whereas human annotators were not.
Discussion
This paper tackled the problem of sentiment classification for blog text within different blog genres. Their method of automated classification was simple, based on sentiment word lists, yet their method was able to achieve high classification on the binary classification task. They had much less success on the ternary classification task, which humans annotators also had less agreement on. This is evidence for the greater difficulty of determining whether a sentence is neutral or polar.
Related papers
- Hatzivassiloglou and McKeown ACL 1997 describes early work in using sentiment word lists to classify text sentiment.
- Hurst and Nigam AAAI-EAAT 2004 proposes techniques for taking valence shifting words into account for sentiment analysis.