Modeling Public Mood and Emotion: Twitter Sentiment and Socio-Economic Phenomena

From Cohen Courses
Jump to navigationJump to search

Citation

Johan Bollen, Huina Mao, Alberto Pepe. Modeling Public Mood and Emotion: Twitter Sentiment and Socio-Economic Phenomena. ICWSM 2011

Summary

This paper presents a solution for global mood detection, a sub-problem of sentiment analysis. The authors aim to see if there is a correlation between major political, cultural, socioeconomic, and/or natural events and the global mood of tweets published in the same day. The dataset used is a corpus of 9.6 million tweets published between Aug 1st and Dec 20, 2008. In order to evaluate the mood of tweets quantitatively, they use an extended version of a well-known psychometric instrument called the Profile of Mood States, which define 6 dimensions of mood (tension, anger, depression, vigour, fatigue, and confusion).

The tweets are tokenized, stopped and stemmed. Then, each day's worth of tweets are scored with the POMS scoring function, which simply counts the number of predefined adjectives for each mood dimension. Due to the disparity between the number of tweets collected in the earlier in the collection period, the authors normalize the mood values to z-scores so that it is normalized with respect to a local mean and standard deviation, where "local" is a sliding window.

Experimental Results

The authors compare the variation of scores of the 6 mood dimensions with a timeline of major events. They notice corresponding daily spikes in appropriate moods such as tension for election day. They did not see a long term effect on the mood by the series of economic events that indicate a downturn.

Comments and Criticsms

As a poster paper, this paper lacks on strong evaluation. It does not give a gold standard or any quantitative metrics to use to measure the efficacy of their methods. However, their method to detect global mood is useful in that its use has long been verified in psychology and in that it does not require training data.

A complaint of the paper I have is the odd disparity on tweet volumes over the collection period. I feel like this is something that could have been easily controlled for, thus eliminating the need to perform score normalization. Normalizing the score for standard deviation may mask actual events where they may be a large split in consensus.

I also must question the applicability of POMS to Twitter. The adjective set identified in POMS may not be the best indicator of moods on Twitter, where the language is significantly looser and contain various acronyms ("lol") and emoticons such as :) and :(.

Related Papers & Links