Difference between revisions of "Andreevskaia et al., ICWSM 2007"

From Cohen Courses
Jump to navigationJump to search
(Created page with 'This a [[Category::Paper]] discussed in Social Media Analysis 10-802 in Spring 2010. == Citation == All Blogs are Not Made Equal: Exploring Genre Differences in Sentiment Tagg…')
 
 
(9 intermediate revisions by the same user not shown)
Line 1: Line 1:
This a [[Category::Paper]] discussed in Social Media Analysis 10-802 in Spring 2010.  
+
This a [[Category::Paper]] for Social Media Analysis 10-802 in Spring 2012.  
  
 
== Citation ==
 
== Citation ==
Line 12: Line 12:
 
== Summary ==
 
== Summary ==
  
This paper explores genre differences in web blogs in regards to [[AddressesProblem::sentiment classification]]. They examined "personal diary" styled blogs and "journalistic" styled blogs, and looked to measure differences in sentiment classifier performance over the two genres, for binary sentiment tasks (positive vs. negative) as well as ternary sentiment tasks (positive vs. negative vs. neutral).
+
This paper explores genre differences in web blogs in regards to [[AddressesProblem::opinion mining]]. They examined "personal diary" styled blogs and "journalistic" styled blogs, and looked to measure differences in sentiment classifier performance over the two genres, for binary sentiment tasks (positive vs. negative) as well as ternary sentiment tasks (positive vs. negative vs. neutral).
  
The authors measured both [[UsesMethod::human computation]] performance on the task, as well as automated [[UsesMethod::keyword-count-based classification]] methods, using performance by humans as a baseline for comparison. The automated method used sentiment word counts and sentiment scores of those words, with a more advanced system also taking into account valence shifting words. The authors used the [[UsesDataset::HM word list]] taken from [[RelatedPaper::Hatzivassiloglou and McKeown ACL 1997]], and expanded the list using [[UsesDataset::WordNet]], to form a more comprehensive sentiment word list.
+
The authors measured both [[UsesMethod::human computation]] performance on the task, as well as automated keyword-count-based [[UsesMethod::sentiment analysis]] methods, using performance by humans as a baseline for comparison. The automated method used sentiment word counts and sentiment scores of those words, with a more advanced system also taking into account valence shifting words. The authors used the [[UsesDataset::HM word list]] taken from [[RelatedPaper::Hatzivassiloglou and McKeown ACL 1997]], and expanded the list using [[UsesDataset::WordNet]], to form a more comprehensive sentiment word list.
 
   
 
   
 
They tested their system on two datasets, one representing journalistic styled blogs (a self-gathered [[UsesDataset::cyberjournalist.net dataset]]) and one representing personal journal styled blogs (the [[UsesDataset::20060501.xml dataset]] provided by the conference organizers). They classified sentiment at the individual sentence level, with each dataset containing 600 sentences, balanced to have 200 sentences of each kind (positive, negative and neutral).
 
They tested their system on two datasets, one representing journalistic styled blogs (a self-gathered [[UsesDataset::cyberjournalist.net dataset]]) and one representing personal journal styled blogs (the [[UsesDataset::20060501.xml dataset]] provided by the conference organizers). They classified sentiment at the individual sentence level, with each dataset containing 600 sentences, balanced to have 200 sentences of each kind (positive, negative and neutral).
Line 21: Line 21:
  
 
The main results are as follows:
 
The main results are as follows:
* Humans inter-annotator agreement on sentiment classification is high for binary [[AddressesProblem::sentiment classification]] (95~99%), but significantly lower for ternary [[AddressesProblem::sentiment classification]] (80%).
+
* Humans inter-annotator agreement on sentiment classification is high for binary sentiment [[AddressesProblem::sentiment analysis]] (95~99%), but significantly lower for ternary [[AddressesProblem::sentiment analysis]] (80%).
 
* For human agents, there is no significant difference in classifying sentiment for the two genres of blog documents, but the ternary task is more difficult than the binary task.
 
* For human agents, there is no significant difference in classifying sentiment for the two genres of blog documents, but the ternary task is more difficult than the binary task.
 
* The automated methods had binary classification accuracies of (Diary: 73%, Journal: 64%) (no valence shifters) and (Diary: 77%, Journal: 67%) (valence shifting words taken into account).
 
* The automated methods had binary classification accuracies of (Diary: 73%, Journal: 64%) (no valence shifters) and (Diary: 77%, Journal: 67%) (valence shifting words taken into account).
Line 36: Line 36:
  
 
== Study plan ==
 
== Study plan ==
* Article:WordNet:[http://en.wikipedia.org/wiki/WordNet]
+
* [http://en.wikipedia.org/wiki/WordNet WordNet]
* Article:WordNet Home:[http://wordnet.princeton.edu/]
+
** A description of the WordNet lexical resource.
* Paper:Retrieving topical sentiments from online document collection:[http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.84.9337]
+
* [http://wordnet.princeton.edu/ WordNet Home]
 +
* Retrieving topical sentiments from online document collection (Hurst and Nigam, 2004):[http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.84.9337]
 +
** Work dealing with extracting sentiment from web documents where valence shifting terms are taken into account.

Latest revision as of 10:30, 3 October 2012

This a Paper for Social Media Analysis 10-802 in Spring 2012.

Citation

All Blogs are Not Made Equal: Exploring Genre Differences in Sentiment Tagging of Blogs, Alina Andreevskaia, Sabine Bergler, and Monica Urseanu, ICWSM 2007

Online version

All Blogs Are Not Made Equal: Exploring Genre Differences in Sentiment Tagging of Blogs

Summary

This paper explores genre differences in web blogs in regards to opinion mining. They examined "personal diary" styled blogs and "journalistic" styled blogs, and looked to measure differences in sentiment classifier performance over the two genres, for binary sentiment tasks (positive vs. negative) as well as ternary sentiment tasks (positive vs. negative vs. neutral).

The authors measured both human computation performance on the task, as well as automated keyword-count-based sentiment analysis methods, using performance by humans as a baseline for comparison. The automated method used sentiment word counts and sentiment scores of those words, with a more advanced system also taking into account valence shifting words. The authors used the HM word list taken from Hatzivassiloglou and McKeown ACL 1997, and expanded the list using WordNet, to form a more comprehensive sentiment word list.

They tested their system on two datasets, one representing journalistic styled blogs (a self-gathered cyberjournalist.net dataset) and one representing personal journal styled blogs (the 20060501.xml dataset provided by the conference organizers). They classified sentiment at the individual sentence level, with each dataset containing 600 sentences, balanced to have 200 sentences of each kind (positive, negative and neutral).

Results

The main results are as follows:

  • Humans inter-annotator agreement on sentiment classification is high for binary sentiment sentiment analysis (95~99%), but significantly lower for ternary sentiment analysis (80%).
  • For human agents, there is no significant difference in classifying sentiment for the two genres of blog documents, but the ternary task is more difficult than the binary task.
  • The automated methods had binary classification accuracies of (Diary: 73%, Journal: 64%) (no valence shifters) and (Diary: 77%, Journal: 67%) (valence shifting words taken into account).
  • The automated methods had ternary classification accuracies of (Diary: 51%, Journal: 48%) (no valence shifters) and (Diary: 53%, Journal: 50%) (valence shifting words taken into account).
  • Therefore, the automated classification methods were sensitive to genre differences, whereas human annotators were not.

Discussion

This paper tackled the problem of sentiment classification for blog text within different blog genres. Their method of automated classification was simple, based on sentiment word lists, yet their method was able to achieve high classification on the binary classification task. They had much less success on the ternary classification task, which humans annotators also had less agreement on. This is evidence for the greater difficulty of determining whether a sentence is neutral or polar.

Related papers

Study plan

  • WordNet
    • A description of the WordNet lexical resource.
  • WordNet Home
  • Retrieving topical sentiments from online document collection (Hurst and Nigam, 2004):[1]
    • Work dealing with extracting sentiment from web documents where valence shifting terms are taken into account.