Mining of Political Issues

From Cohen Courses
Revision as of 23:44, 8 October 2012 by Austinma (talk | contribs) (Created page with 'In this project I will examine the problem of automatically determining relevant campaign issues from a set of political blog posts and comments. In particular, this will involve…')
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

In this project I will examine the problem of automatically determining relevant campaign issues from a set of political blog posts and comments. In particular, this will involve discovering the political affiliation, either Republican, Democrat, or Neutral, of each author of a blog, and then determining the topics they seem to discuss the most. I will determine what topics are discussed by many bloggers, and then use sentiment analysis to determine each blogger's stance towards each topic. Topics that seem to have very different sentiment values given the author's political affiliation are likely to be divisive political issues.

Dataset

I will be using the Political Blog Corpora of Yano, Cohen, and Smith. This data set consists of over 9000 main blog entries and many, many more comments attached thereto. I will divide this corpus into Republican and Democrat, which are known. I will extract a subset of this data to train classifiers to determine whether the authors of comments are Republicans, Democrats, or Independents. The remaining data will be used to determine topics and their associated sentiments.

Methodology

  • First, I will classify the author of each comment as either R, D or I. The affiliation of the primary author of each blog is known.
  • I will then extract topics from the data set using standard theme-finding algorithms with a TF-IDF baseline.
  • I will then find each instance each topic and use sentiment analysis techniques to discover the author's opinion towards that topic.
  • Finally, I will search for topics that seem to have positive sentiment among Republicans, but negative sentiment among Democrats, or vice-versa.

Study Plan

Having no background in IR or Sentiment Analysis, there will be a fair amount of reading required to prepare me for this project. Fortunately, I summarized the following three sentiment analysis related papers on this Wiki:

Takamura et al.'s "Extracting Semantic Orientations of Words using Spin Model" Andrea Esuli and Fabrizio Sebastiani's "SENTIWORDNET: A Publicly Available Lexical Resource for Opinion Mining" Jason S. Kessler and Nicolas Nicolov's "Targeting Sentiment Expressions through Supervised Ranking of Linguistic Configurations"

Furthermore, I will need to read up on topic identification, which might come from papers such as: Feifan Liu, Deana Pennell, Fei Liu and Yang Liu's "Unsupervised Approaches for Automatic Keyword Extraction Using Meeting Transcripts" H Vollmer's "Automatic Keyword Extraction for Database Search"

Team Members

References

Predicting Response to Political Blog Posts with Topic Models Tae Yano, William W. Cohen, and Noah A. Smith NAACL-HLT 2009, Boulder, CO, May–June 2009