Difference between revisions of "OConnor et. al., ICWSM 2010"

From Cohen Courses
Jump to navigationJump to search
Line 39: Line 39:
 
This work uses aggregate sentiment, which has been used before in:
 
This work uses aggregate sentiment, which has been used before in:
 
* The paper by [[RelatedPaper::Gilbert and Karahalios ICWSM 2010]] analyses stock behavior based on the text present in blogs.  
 
* The paper by [[RelatedPaper::Gilbert and Karahalios ICWSM 2010]] analyses stock behavior based on the text present in blogs.  
* The same is done using news articles in [[RelatedPaper::Lavrenko et al SIGKDD 2000]] and [[RelatedPaper::Koppel and Shtrimberg AAAI]].
+
* The same is done using news articles in [[RelatedPaper::Lavrenko et al SIGKDD 2000]] and [[RelatedPaper::Koppel and Shtrimberg AAAI 2004]].
  
 
== Study plan ==
 
== Study plan ==

Revision as of 16:43, 26 September 2012

Citation

Brendan O’Connor, Ramnath Balasubramanyan, Bryan R. Routledge, and Noah A. Smith. 2010a. From tweets to polls: Linking text sentiment to public opinion time series. In Proc. of ICWSM.

Online version

From tweets to polls: Linking text sentiment to public opinion time series

Summary

This Paper attempts correlate the results of several surveys related consumer confidance and political opinions, with the sentiment words frequencies found in Twitter. The main motivation is that mining opinions in Twitter can be used as an alternative method to conducting surveys, which can be time consuming and comparatively expensive.

This task can be divided into two steps. First, collect relevant tweets from the Twitter corpora and then determine whether the tweets express positive or negative opinion.

Polls

Public opinion polls considered in this work are obtained by telephone surveys and available to the public.

For consumer confidence, the survey from the University of Michigan was used (available here). Another poll that was used is the Gallup Organization's "Economic Confidence" index (available here).

For political opinion, two pools were used. The first is the Gallup's poll for presidential job approaval rating for Obama over the year of 2009, available here. The other is the poll was obtained from Pollster.com, where people were asked whether they were voting for Obama or McCain.

Aggregate Sentiment

This paper shows that simple methods that are subject to noise perform relatively well when used to estimate aggregate sentiment.

Identifying related tweets is performed simply by searching tweets that contain a set of pre-selected keywords, such as Obama and Mccain for tweets about presidential elections. The general sentiment of a tweet is identified by looking for words that have an either positive or negative polarity, and the sentiment of a tweet is classified as positive if it contains a positive polarity word and negative if it contains a negative polarity word or both if it contains positive and negative polarity words. The aggregate opinion is calculated as the ratio between the number of positive tweets and the number of negative ones.

We can see that in both steps, the methods leave a large margin of error. Firstly, there is no guarantee that all messages with the keywords will be related to the topic of interest. Furthermore, the method for classifing the tweets as positive and negative, without looking at the context, is a very basic one in literature.

However, since the goal is to estimate the aggregate opinion, the noise in each individual tweet can be amortized by having a significant sample of tweets.

Correlation Analysis

It is shown that for the consumer's confidence trends from both surveys, that using sentiment from tweets can capture the broad trends in the survey data, especially with the Gallup data. The same thing for the Obama job approaval poll in 2009.

The only case the aggregate sentiment did not correlate very well with polls is the 2008 election polls.

Related papers

This work uses aggregate sentiment, which has been used before in:

Study plan