Capturing Global Mood Levels using Blog Posts
Gilad Mishne and Maarten de Rijke. Capturing Global Mood levels by using Blog Posts, AAAI 2006.
This paper presents a technique for modeling aggregate mood levels from a large number of blog posts using sentiment analysis sentiment analysis. The authors note that the problem is different from text classification in that mood levels are transient and change quickly. They use a two-stage process - 1) Identifying the textual features that help estimate mood prevalence and 2) learning linear regression models on these features.
This work uses Livejournal dataset. They use the top n-grams of each mood subcorpus and determine the terms that are most 'discriminating'. These along with non-textual features like time of the posting, total amount of blog posts in the hour etc. Linear regression coefficients are then estimated and evaluated using correlation and relative error from K-fold cross validation.
Case studies are presented to illustrate the method by predicting the overall 'sad'ness on the day of the London bombings and usual 'excite'ment on weekends.
Related recent works in aggregate mood prediction include -
- Dodds et al., Measuring the happiness of large-scale Written Expression: Songs, Blogs and Presidents, Jrnl of Happiness '09
- Bollen et al., Modeling public mood and emotion: Twitter Sentiment and Socio-Economic Phenomena, WWW '10
- Abbasi et al., Affect analysis of web forums and blogs using correlation ensembles, IEEE Trans on Knowledge and Data, '08
- Eric Gilbert, Karrie Karahalios, Widespread Worry and the Stock Market, ICWSM '10
- Brendan O'Connor,Ramnath Balasubramanyan,Bryan R. Routledge, Noah A. Smith, From Tweets to Polls: Linking Text Sentiment to Public Opinion Time Series., ICWSM '10
- A related survey: Pang & Lee survey.
A good summary of Aggregate Sentiment research from Abbasi et. al -