Project 2nd draft Derry Reyyan

From Cohen Courses
Revision as of 17:36, 14 February 2011 by Dwijaya (talk | contribs) (→‎Motivation)
Jump to navigationJump to search

Social Media Analysis Project Ideas

Team Members

Derry Wijaya [dwijaya@cs.cmu.edu]

Reyyan Yeniterzi [reyyan@cs.cmu.edu]

Project Idea

Understanding change

Given an entity of interest, we would like to model and analyze its change (in terms of words and phrases that co-occur with it) over time.

We propose to construct a social graph, but instead of people, we put words as nodes and edges are weighted based on number of co-occurrence between the words. Using this social graph of words, we propose to analyze:

How co-occurrence with other words influences the meaning or the sentiment associated with the word. For example, the word 'BP' frequently co-occurred with negatively associated words during and after the Gulf-spill event.

Dataset

Google Books Ngram Data.

Motivation

How does the semantic of a word or sentiment associated with it change over time depending on its neighbor (i.e. co-occurring words/phrases)? Does such change relate to a particular event that happens in the same period of time? Can we find a natural sequence of events that define a change of state/semantic/sentiment of a particular entity?

Techniques

For each of the ideas above, proposed techniques or related papers are (in order of the ideas):

• Clustering of opinions. Finding when a group of opinions break into two in time (to detect the time t where a change in opinion occurs, followed by the grow of another group of opinion cluster). Topic modeling of news document to pinpoint the particular event at that time t that may cause the change. Related recent paper: Identifying Breakpoints in Public Opinion.

• Using centrality and betweenness measures in social network analysis, but applied to a network of opinions (Related paper: Betweenness Centrality as an Indicator of the Interdisciplinarity of Scientific Journals). Random walk on the graph to find ring leaders and clusters of opinions. Schelling segregation to measure spatial segregation (we first need to define what 'space' means in the graph of opinions). A related paper to segregation in graph is The Collective Dynamics of Smoking in a Large Social Network.

• Regression analysis to measure tendency of a word to become negative in meaning over time, when co-occurred with negative words (Related paper: The Spread of Obesity in a Large Social Network over 32 Years - applied to measuring the spread of negativity in a network of words).

• Using Bayes rule to measure probability of two people having a link in Twitter based on their friends links and opinions and spatial-temporal overlap. An interesting relation to a recent paper Inferring social ties from geographic coincidences.

Evaluation

A combination of manual evaluation and cross validation (splitting the data into training and testing and evaluate) may be done.

Superpowers

• Nothing really at the moment, except for a bag full of ideas and a lot of keenness in pursuing at least one of them well.