Project - Second Draft Proposal - Yuzhou

From Cohen Courses
Jump to navigationJump to search

Sentiment in Blogging Community and Wall Street

Second Draft

Team Members

Yuzhou Xin

Introduction

Our emotions affect our actions. In this project, I would like to apply sentiment analysis techniques to study the general emotional state of twitter,a virtual community, and use the result as a predictor for the financial market. Specifically, I would like to see if non-neutral emotional state of the community can be a good predictor for volatility in the US equity market. Also, can we extend the result to global markets. For example, can the mood in U.S. community affect markets in Europe or Asia. This will be an extension of the research done by Eric Gilbert and Karrie Karahalios.

Dataset

  • In the beginning, I'll reuse the JiveJournal dataset gathered by Gilbert.:

Gilbert's data [1] Livejournal's website [2] (unable to get raw data)

  • All blog posts from 2005/07/04 to 2005/07/24, gathered by Nielsen Buzzmetrics Dataset

(the time period might be a little short for training/evaluating)

  • Twitter tweets with tag $AAPL, iphone, $MSFT, gathered myself using Archivist.
  • S&P 500 index(year 2000-2011) gathered from Yahoo! Finance
  • NASDAQ-100 Technology Sector Index(year 2000-2011) gathered from Yahoo! Finance

Proposed Work

We need to first build emotion indicators/index for the twitter community using sentiment analysis techniques. We need to find ways to classify the community emotion as either neutral or non-neutral(happy/sad). Then we need to apply a learning algorithm to the indicators so that we can use them to predict market volatility. In the end, we want to check their predictive powers on US/Global market. If possible, we would like to use data from another community to see if it still supports our hypothesis.

For simplicity, we will start by looking at tweets about a particular company, Apple.

  • Step 1: Gather all tweets containing keyword $AAPL for a time period and its stock price
  • Step 2: Build a basic bag of word classifier on those data( as a benchmark)
  • Step 3: train it on 2 months of data, check the prediction accuracy
  • Step 4: Now we need to do the same thing using sentiment orientation instead of a simple bag of word classifier.
 Step 4-1: Apply part of speech tagger to those tweets
 Step 4-2: Use PMI_IR method find SO for each phrase
 
 Step 4-3: Average all phrases to get the orientation of a post. Then average all posts to get an average of the day
  • Step 5: Make a predictor of future stock market based on the SO of the day

Evaluation

To show whether the emotion of blogging community has prediction power for the stock market, I'll be using Granger-causal Analysis.

M1: To predict M_t without using information about sentiment orientation:

M1: To predict M_t using information about sentiment orientation:

Want to see if M2 is better than M1

Related Work

Similar work has done by Eric Gilbert and Karrie Karahalios in their paper Widespread Worry and the Stock Market. They combined a boost decision tree and a NB classifier to form a predictor for anxiety. Then they try to use it to predict stock market downward movement.

References

Widespread Worry and the Stock Market by Eric Gilbert and Karrie Karahalios[3]

Twitter mood predicts the stock market by Johan Bollen, Huina Mao, and Xiaojun Zeng[4]

Predicting Risk from Financial Reports with Regression, Kogan et al[5]