Widespread Worry and the Stock Market
This a Paper for Social Media Analysis 10-802 in Fall 2012.
Citation
Gilbert, E. and Karahalios, K., Widespread worry and the stock market, 2010, In Proceedings of the international conference on weblogs and social media (ICWSM 10),
Online version
Widespread worry and the stock market
Summary
This paper attempts to find correlations between LiveJournal blog post sentiment and S&P 500 stock market changes, using opinion mining. The authors used supervised learning techniques to classify posts in a binary fashion into sets of *anxiety posts* and *not anxious posts*. The classifiers used had low recall (28% and 32%), but high precision, with false positive rates of 3% and 6%. Thus, the classifiers were conservative in assigning posts to the anxious class.
Using the classifier labeled posts, an **Anxiety index** was created by taking the fraction of of anxious posts per day and taking the difference of the logarithm of such values for two consecutive days. This anxiety time series was compared to the S&P 500 time series, where for the S&P data, the difference of consecutive log-returns was used, namely .
To evaluate if there existed correlation between the anxiety index and the S&P 500 data, the authors turned to Granger Causality, which is a statistical hypothesis test for determining whether one time series is useful for forecasting another. They compared the variance explained by two linear models: one containing the anxiety index data along with the S&P data, and the other using the S&P data alone.
Results
The authors found, in one instance, a significant improvement in forecasting accuracy when using the linear model that included the anxiety data, with . However, the authors allude to trying many different time lags and variations on window width, so the result may be the result of multiple hypothesis testing, since they do not appear to include a correction factor for performing multiple tests. Other results given in the paper have large p-values (> .10, and in some cases > .20), so the significance of such results are questionable.
The authors also performed Monte Carlo simulation in an attempt to verify their main result, and found the p-value grew to , coming close to not being significant for small alpha values.
Lastly, the authors tested whether market trends could predict anxiety data, and didn't find a significant correlation.
Discussion
The results in this paper are borderline significant, in the statistical sense, and may not be so if multiple hypothesis tests were performed without correcting for such effects in the significance tests. The conclusion reached by the authors, namely that measuring anxiety in blog data is useful for predicting stock market changes, is one that would need further research to definitively support. Therefore, the paper is interesting, but its significance is not immediately obvious.
Related papers
- Tetlock JOF 2007 argued that pessimism in a Wall Street Journal column had novel information about Dow returns from 1984 to 1987.
- OConnor et. al., ICWSM 2010 analyzes twitter data to measure the correlation between twitter sentiment and public opinion polls.
Study plan
Some concepts which made aid in understanding this paper
- Term Frequency, Inverse Document Frequency (tf * idf)
- A tutorial slideshow discussing the particular method the authors used to vectorize and weight synset glosses.
- WordNet
- A description of the WordNet lexical resource.
- WordNet Home