Project 2nd draft Derry Reyyan

From Cohen Courses
Revision as of 16:01, 14 February 2011 by Dwijaya (talk | contribs) (Created page with 'Social Media Analysis Project Ideas == Team Members == Derry Wijaya [dwijaya@cs.cmu.edu] Reyyan Yeniterzi [reyyan@cs.cmu.edu] == Project Idea…')
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

Social Media Analysis Project Ideas

Team Members

Derry Wijaya [dwijaya@cs.cmu.edu]

Reyyan Yeniterzi [reyyan@cs.cmu.edu]

Project Idea

Understanding change

Given an entity of interest, we would like to model and analyze its change (in terms of words and phrases that co-occur with it) over time.

We propose to construct a social graph, but instead of people, we put words as nodes and edges are weighted based on number of co-occurrence between the words. Using this social graph of words, we propose to analyze:

How co-occurrence with other words influences the meaning or the sentiment associated with the word. For example, the word 'BP' frequently co-occurred with negatively associated words during and after the Gulf-spill event.

Dataset

Google Books Ngram Data.

Motivation

For each of the ideas above, our motivations are (in order of the ideas):

• It will be interesting to find out how an event reported in a news article can change a blogger's opinion on the related topic. How often bloggers start writing about a topic for the first time after reading about a related event in the news?

• It will be interesting to find out whether centrality and betweenness apply to a graph of opinions. A graph can be constructed where each node is a piece of opinion and the edges are similarities between the opinions. Can we then find in the graph, which opinion(s) is(are) the ringleaders? Are there neutral or indecisive opinions that act as go-between between different groups of opinions? How cohesive are the groups of opinions? How does the graph change overtime? Are there spatial segregation in the graph (where minority opinions) are pushed to the periphery of the graph?

• It will be interesting to find out whether homophily occurs in words. If a word starts to 'hang out' (tend to co-occur) with negatively associated words, will its semantic and usage become negative? (social contagion) Do negative words tend to co-occur together? (associative sorting). How does the semantic of a word change depending on its neighbor (i.e. co-occurring words)?

• It will be interesting to do opinion mining on Twitter data, to find out whether follower/following links have an influence in the spread of opinions in Twitter; or if people from the same Geo-location will tend to have the same opinions. Another interesting thing is to find out whether we can predict whether a person will become a follower of/be followed by another person based on similarity of their follower/following links, similarity of opinions, temporal-coincidence of the opinions, and geographic coincidence: i.e. whether two persons with a similar followers, who follow b similar people, who has c degree of opinion similarity, who voice their opinions within d days of each other, and who are located in e geographical distance apart are likely to follow/be-followed by one another?

Techniques

For each of the ideas above, proposed techniques or related papers are (in order of the ideas):

• Clustering of opinions. Finding when a group of opinions break into two in time (to detect the time t where a change in opinion occurs, followed by the grow of another group of opinion cluster). Topic modeling of news document to pinpoint the particular event at that time t that may cause the change. Related recent paper: Identifying Breakpoints in Public Opinion.

• Using centrality and betweenness measures in social network analysis, but applied to a network of opinions (Related paper: Betweenness Centrality as an Indicator of the Interdisciplinarity of Scientific Journals). Random walk on the graph to find ring leaders and clusters of opinions. Schelling segregation to measure spatial segregation (we first need to define what 'space' means in the graph of opinions). A related paper to segregation in graph is The Collective Dynamics of Smoking in a Large Social Network.

• Regression analysis to measure tendency of a word to become negative in meaning over time, when co-occurred with negative words (Related paper: The Spread of Obesity in a Large Social Network over 32 Years - applied to measuring the spread of negativity in a network of words).

• Using Bayes rule to measure probability of two people having a link in Twitter based on their friends links and opinions and spatial-temporal overlap. An interesting relation to a recent paper Inferring social ties from geographic coincidences.

Evaluation

A combination of manual evaluation and cross validation (splitting the data into training and testing and evaluate) may be done.

Superpowers

• Nothing really at the moment, except for a bag full of ideas and a lot of keenness in pursuing at least one of them well.