Difference between revisions of "Project - Second Draft Proposal - Yuzhou"

From Cohen Courses
Jump to navigationJump to search
Line 16: Line 16:
 
* Gilbert's data [http://social.cs.uiuc.edu/people/gilbert/38]
 
* Gilbert's data [http://social.cs.uiuc.edu/people/gilbert/38]
 
* Livejournal's website [http://www.livejournal.com/]
 
* Livejournal's website [http://www.livejournal.com/]
 +
(unable to get raw data)
 +
 +
All blog posts from 2005/07/04 to 2005/07/24, gathered by Nielsen Buzzmetrics Dataset
 +
(the time period might be a little short for training/evaluating)
 +
 +
Twitter tweets with tag $AAPL
 +
Gathered by a stream reader I built myself.
 +
(still in the gathering process)
 +
 +
S&P 500 index(year 2000-2011) gathered from Yahoo! Finance
 +
 +
NASDAQ-100 Technology Sector Index(year 2000-2011) gathered from Yahoo! Finance
  
 
== Proposed Work ==  
 
== Proposed Work ==  
  
 
We need to first build emotion indicators/index for the LiveJournal community using sentiment analysis techniques. We need to find ways to classify the community emotion as either neutral or non-neutral(happy/sad).  Then we need to apply a learning algorithm to the indicators so that we can use them to predict market volatility. In the end, we want to check their predictive powers on US/Global market. If possible, we would like to use data from another community to see if it still supports our hypothesis.  
 
We need to first build emotion indicators/index for the LiveJournal community using sentiment analysis techniques. We need to find ways to classify the community emotion as either neutral or non-neutral(happy/sad).  Then we need to apply a learning algorithm to the indicators so that we can use them to predict market volatility. In the end, we want to check their predictive powers on US/Global market. If possible, we would like to use data from another community to see if it still supports our hypothesis.  
 +
 +
== Evaluation ==
 +
 +
To show whether the emotion of blogging community has prediction power for the stock market, I'll be using Granger-causal Analysis.
 +
  
 
== Related Work ==  
 
== Related Work ==  
Line 27: Line 44:
 
== References ==
 
== References ==
 
Widespread Worry and the Stock Market by Eric Gilbert and Karrie Karahalios[http://social.cs.uiuc.edu/people/gilbert/38]
 
Widespread Worry and the Stock Market by Eric Gilbert and Karrie Karahalios[http://social.cs.uiuc.edu/people/gilbert/38]
 +
Twitter mood predicts the stock market by Johan Bollen, Huina Mao, and Xiaojun Zeng[http://arxiv.org/PS_cache/arxiv/pdf/1010/1010.3003v1.pdf]
 +
Predicting Risk from Financial Reports with Regression, Kogan et al[http://www.cs.cmu.edu/~nasmith/papers/kogan+levin+routledge+sagi+smith.naacl09.pdf]

Revision as of 16:12, 15 February 2011

Sentiment in Blogging Community and Wall Street

Second Draft

Team Members

Yuzhou Xin

Introduction

Our emotions affect our actions. In this project, I would like to apply sentiment analysis techniques to study the general emotional state of LiveJournal,a virtual community, and use the result as a predictor for the financial market. Specifically, I would like to see if non-neutral emotional state of the community can be a good predictor for volatility in the US equity market. Also, can we extend the result to global markets. For example, can the anxiety in U.S. community affect markets in Europe or Asia. This will be an extension of the research done by Eric Gilbert and Karrie Karahalios.

Dataset

In the beginning, I'll reuse the JiveJournal dataset gathered by Gilbert.

  • Gilbert's data [1]
  • Livejournal's website [2]

(unable to get raw data)

All blog posts from 2005/07/04 to 2005/07/24, gathered by Nielsen Buzzmetrics Dataset (the time period might be a little short for training/evaluating)

Twitter tweets with tag $AAPL Gathered by a stream reader I built myself. (still in the gathering process)

S&P 500 index(year 2000-2011) gathered from Yahoo! Finance

NASDAQ-100 Technology Sector Index(year 2000-2011) gathered from Yahoo! Finance

Proposed Work

We need to first build emotion indicators/index for the LiveJournal community using sentiment analysis techniques. We need to find ways to classify the community emotion as either neutral or non-neutral(happy/sad). Then we need to apply a learning algorithm to the indicators so that we can use them to predict market volatility. In the end, we want to check their predictive powers on US/Global market. If possible, we would like to use data from another community to see if it still supports our hypothesis.

Evaluation

To show whether the emotion of blogging community has prediction power for the stock market, I'll be using Granger-causal Analysis.


Related Work

Similar work has done by Eric Gilbert and Karrie Karahalios in their paper Widespread Worry and the Stock Market. They combined a boost decision tree and a NB classifier to form a predictor for anxiety. Then they try to use it to predict stock market downward movement.

References

Widespread Worry and the Stock Market by Eric Gilbert and Karrie Karahalios[3] Twitter mood predicts the stock market by Johan Bollen, Huina Mao, and Xiaojun Zeng[4] Predicting Risk from Financial Reports with Regression, Kogan et al[5]