Widespread Worry and the Stock Market

From Cohen Courses
Jump to navigationJump to search

This a Paper for Social Media Analysis 10-802 in Fall 2012.

Citation

Gilbert, E. and Karahalios, K., Widespread worry and the stock market, 2010, In Proceedings of the international conference on weblogs and social media (ICWSM 10).

Online version

Widespread worry and the stock market

Summary

This paper attempts to find correlations between LiveJournal blog post sentiment and S&P 500 stock market changes, using opinion mining. The authors used supervised learning techniques to classify posts in a binary fashion into sets of anxiety posts and not anxious posts. The classifiers used had low recall (28% and 32%), but high precision, with false positive rates of 3% and 6%. Thus, the classifiers were conservative in assigning posts to the anxious class.

Using the classifier labeled posts, an Anxiety index was created by taking the fraction of of anxious posts per day and taking the difference of the logarithm of such values for two consecutive days. This anxiety time series was compared to the S&P 500 time series, where for the S&P data, the difference of consecutive log-returns was used, namely .

To evaluate if there existed correlation between the anxiety index and the S&P 500 data, the authors turned to Granger Causality, which is a statistical hypothesis test for determining whether one time series is useful for forecasting another. They compared the variance explained by two linear models: one containing the anxiety index data along with the S&P data, and the other using the S&P data alone.

Results

The authors found, in one instance, a significant improvement in forecasting accuracy when using the linear model that included the anxiety data, with . However, the authors allude to trying many different time lags and variations on window width, so the result may be the result of multiple hypothesis testing, since they do not appear to include a correction factor for performing multiple tests. Other results given in the paper have large p-values (> .10, and in some cases > .20), so the significance of such results are questionable.

The authors also performed Monte Carlo simulation in an attempt to verify their main result, and found the p-value grew to , coming close to not being significant for small alpha values.

Lastly, the authors tested whether market trends could predict anxiety data, and didn't find a significant correlation.

Figure

Anxiety correlation.png

This figure purports to show an inverse correlation between the time-lagged (2 days) anxiety index data and the S&P 500 data.

Discussion

The results in this paper are borderline significant, in the statistical sense, and may not be so if multiple hypothesis tests were performed without correcting for such effects in the significance tests. The conclusion reached by the authors, namely that measuring anxiety in blog data is useful for predicting stock market changes, is one that would need further research to definitively support. Therefore, the paper is interesting, but its significance is not immediately obvious.

Related papers

  • Tetlock JOF 2007 argued that pessimism in a Wall Street Journal column had novel information about Dow returns from 1984 to 1987.
  • OConnor et. al., ICWSM 2010 analyzes twitter data to measure the correlation between twitter sentiment and public opinion polls.

Study plan

Some concepts which made aid in understanding this paper

  • Granger causality
    • An econometric method of testing whether one time series improves prediction of a second time series.