Mining of Political Issues

From Cohen Courses
Jump to navigationJump to search

Comments

The Yano et al blogs are all from right-leaning or left-leaning communities, so party affiliation doesn't make much sense here. I don't know if there are more appropriate datasets - Politics.com dataset is one, but it's not an easy dataset, and the documents are not really comments.

I do like the idea of doing sentiment analysis to determine party affiliation, though it's not obvious that it would be needed for news-story comment classification. Jan Weibe has done some relevant prior work on stance classification: e.g., Swapna Somasundaran and Janyce Wiebe. (2010). Recognizing Stances in Ideological On-Line Debates. NAACL-HLT 2010 Workshop on Computational Approaches to Analysis and Generation of Emotion in Text. This might be a good starting point for a lit search. Another is: Support or oppose?: classifying positions in online debates from reply activities and opinion expressions, an ACL short paper from 2010. --Wcohen 18:53, 10 October 2012 (UTC)

Abstract

In this project I will examine the problem of automatically determining relevant campaign issues from a set of political blog posts and comments. In particular, this will involve discovering the political affiliation, either Republican, Democrat, or Neutral, of each author of a blog, and then determining the topics they seem to discuss the most. I will determine what topics are discussed by many bloggers, and then use sentiment analysis to determine each blogger's stance towards each topic. Topics that seem to have very different sentiment values given the author's political affiliation are likely to be divisive political issues.

Dataset

I will be using the Political Blog Corpora of Yano, Cohen, and Smith. This data set consists of over 9000 main blog entries and many, many more comments attached thereto. I will divide this corpus into Republican and Democrat, which are known. I will extract a subset of this data to train classifiers to determine whether the authors of comments are Republicans, Democrats, or Independents. The remaining data will be used to determine topics and their associated sentiments.

Methodology

  • First, I will classify the author of each comment as either R, D or I. The affiliation of the primary author of each blog is known.
  • I will then extract topics from the data set using standard theme-finding algorithms with a TF-IDF baseline.
  • I will then find each instance each topic and use sentiment analysis techniques to discover the author's opinion towards that topic.
  • Finally, I will search for topics that seem to have positive sentiment among Republicans, but negative sentiment among Democrats, or vice-versa.

Study Plan

Having no background in IR or Sentiment Analysis, there will be a fair amount of reading required to prepare me for this project. Fortunately, I summarized the following three sentiment analysis related papers on this Wiki:

Takamura et al.'s "Extracting Semantic Orientations of Words using Spin Model" Andrea Esuli and Fabrizio Sebastiani's "SENTIWORDNET: A Publicly Available Lexical Resource for Opinion Mining" Jason S. Kessler and Nicolas Nicolov's "Targeting Sentiment Expressions through Supervised Ranking of Linguistic Configurations"

Furthermore, I will need to read up on topic identification, which might come from papers such as: Feifan Liu, Deana Pennell, Fei Liu and Yang Liu's "Unsupervised Approaches for Automatic Keyword Extraction Using Meeting Transcripts" H Vollmer's "Automatic Keyword Extraction for Database Search"

Team Members

References

Predicting Response to Political Blog Posts with Topic Models Tae Yano, William W. Cohen, and Noah A. Smith NAACL-HLT 2009, Boulder, CO, May–June 2009