Difference between revisions of "Project 2nd draft Derry Reyyan"

Revision as of 19:18, 14 February 2011

Social Media Analysis Project Ideas

Team Members

Derry Wijaya [dwijaya@cs.cmu.edu]

Project Idea

Understanding change

Given an entity of interest, we would like to model and analyze its change in terms of words and phrases that co-occur with it over time.

We propose to construct a social graph, but instead of people, we put words as nodes and edges are weighted based on number of co-occurrence between the words. Using this social graph of words, we propose to analyze:

(1) how co-occurrence with other words change over time

(2) how the change influences the state (semantic or sentiment) associated with the entity

(3) how the change may correspond to events that occur during the same period of time

For example, the entity 'BP' frequently co-occurred with negatively associated words during and after the Gulf-spill event.

Dataset

Google Books Ngram Data.

Motivation

The co-occurrence of words changes over time

It will be interesting to model this change and analyze:

(1) How the state (semantic or sentiment) of a given entity changes over time depending on its neighbors (i.e. co-occurring words/phrases)

(2) How such changes relate to events that occur in the same period of time

(3) Whether we can find a natural sequence of events that define a change of state (semantic or sentiment) of a given entity

(4) Whether we can use (3) to predict the change of state of a given entity

Techniques

(1) Linear regression analysis to measure tendency of a word to become negative/positive in meaning over time, when co-occurred with negative/positive words

(Related paper: The Spread of Obesity in a Large Social Network over 32 Years - techniques applied to spread of obesity in a network of people).

(2) Identify break points in the states of an entity (based on its co-occurrence changes) and find events that correspond to the break points

(Related paper: Identifying Breakpoints in Public Opinion - technique to identify break points in sentiment found in tweets (Twitter), using a set of manually constructed emotion words (Vector space models))

(Related paper: Quantitative Analysis of Culture Using Millions of Digitized Books - measure usage frequency over time of a given n-gram (such as "slavery", "great war", etc) that represents an entity of interest)

(3) Graph modeling of co-occurrences (where nodes are words and edges are weighted by number of co-occurrences between words). Use of techniques from dynamic network evolution, link analysis or clustering to model and analyze changes in this graph over time

Evaluation

A combination of manual evaluation and cross validation (splitting the data into training and testing and evaluate) may be done.

@@ Line 45: / Line 45: @@
 == Techniques ==
-(1) Regression analysis to measure tendency of a word to become negative/positive in meaning over time, when co-occurred with negative/positive words
+(1) [[UsesMethod::Linear regression]] analysis to measure tendency of a word to become negative/positive in meaning over time, when co-occurred with negative/positive words
 (Related paper: [http://www.nejm.org/doi/pdf/10.1056/NEJMsa066082 The Spread of Obesity in a Large Social Network over 32 Years] - techniques applied to spread of obesity in a network of people).
@@ Line 51: / Line 51: @@
 (2) Identify break points in the states of an entity (based on its co-occurrence changes) and find events that correspond to the break points
-(Related paper: [http://snap.stanford.edu/soma2010/papers/soma2010_9.pdf Identifying Breakpoints in Public Opinion] - technique to identify break points in sentiment found in tweets (Twitter), using a set of manually constructed emotion words)
+(Related paper: [http://snap.stanford.edu/soma2010/papers/soma2010_9.pdf Identifying Breakpoints in Public Opinion] - technique to identify break points in sentiment found in tweets (Twitter), using a set of manually constructed emotion words ([[UsesMethod::Vector space models]]))
 (Related paper: [http://www.sciencemag.org/content/early/2010/12/15/science.1199644 Quantitative Analysis of Culture Using Millions of Digitized Books] - measure usage frequency over time of a given n-gram (such as "slavery", "great war", etc) that represents an entity of interest)

Difference between revisions of "Project 2nd draft Derry Reyyan"

Revision as of 19:18, 14 February 2011

Contents

Team Members

Project Idea

Dataset

Motivation

Techniques

Evaluation

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools