Difference between revisions of "Project 2nd draft Derry Reyyan"
(23 intermediate revisions by the same user not shown) | |||
Line 11: | Line 11: | ||
'''Understanding change''' | '''Understanding change''' | ||
− | Given an entity of interest, we would like to model and analyze its change in terms of words and phrases that co-occur with it over time. | + | Given an entity of interest, we would like to model and analyze its change in terms of words and phrases that co-occur with it over time. In other words, we would like to understand the change in [[AddressesProblem::Social Network Attribute]] over time, where the social network is defined over words. |
− | We propose to construct a social | + | We propose to construct a social network, but instead of people, we put words as nodes and edges are weighted based on the number of co-occurrences between the words. Using this social network of words, we propose to analyze: |
(1) how co-occurrence with other words change over time | (1) how co-occurrence with other words change over time | ||
Line 43: | Line 43: | ||
(4) Whether we can use (3) to predict the change of state of a given entity | (4) Whether we can use (3) to predict the change of state of a given entity | ||
− | == Techniques == | + | == Techniques and Related Works == |
− | (1) | + | (1) [[UsesMethod::Linear regression]] analysis to measure tendency of a word to become negative/positive in meaning over time, when co-occurred with negative/positive words |
− | + | Related paper: [[RelatedPaper::Nicholas A. Christakis, M.D., Ph.D., M.P.H., and James H. Fowler, Ph.D. (2007) The Spread of Obesity in a Large Social Network over 32 Years]]: [http://www.nejm.org/doi/pdf/10.1056/NEJMsa066082 External Link] - techniques to analyze the spread of obesity in a network of people | |
(2) Identify break points in the states of an entity (based on its co-occurrence changes) and find events that correspond to the break points | (2) Identify break points in the states of an entity (based on its co-occurrence changes) and find events that correspond to the break points | ||
− | + | Related paper: [[RelatedPaper::Akcora et.al. (2010) Identifying Breakpoints in Public Opinion]]: [http://snap.stanford.edu/soma2010/papers/soma2010_9.pdf External Link] - technique to identify break points in sentiments found in tweets (Twitter), using a set of manually constructed emotion words ([[UsesMethod::Vector space models]]) | |
− | + | Related paper: [[RelatedPaper::Michel et.al. (2010) Quantitative Analysis of Culture Using Millions of Digitized Books]]: [http://www.sciencemag.org/content/early/2010/12/15/science.1199644 External Link] - measure usage frequency over time of a given n-gram (such as "slavery", "great war", etc) that represents an entity of interest | |
− | (3) Graph modeling of co-occurrences (where nodes are words and edges are weighted by number of co-occurrences between words). Use of techniques from dynamic network evolution, link analysis or clustering to model and analyze changes in this graph over time | + | (3) Graph modeling of co-occurrences (where nodes are words and edges are weighted by the number of co-occurrences between words). Use of techniques from dynamic network evolution, link analysis or [[UsesMethod::clustering]] to model and analyze changes in this graph over time |
+ | |||
+ | Related paper: [[RelatedPaper::Cemal Cagatay Bilgin and Bülent Yener (2010) Dynamic Network Evolution: Models, Clustering, Anomaly Detection]]: [http://www.cs.rpi.edu/research/pdf/08-08.pdf External Link] | ||
+ | |||
+ | (4) For baseline, we plan to use [[UsesMethod::Bayes' Law]] to measure probability that a word will co-occur with an entity, given the entity and other words that have co-occurred with the entity: | ||
+ | |||
+ | :<math>p(word_i \vert entity, word_1,\dots,word_{i-1}) = \frac{p(word_i) \ p(entity,word_1,\dots,word_{i-1}\vert word_i)}{p(entity,word_1,\dots,word_{i-1})}. \,</math> | ||
+ | |||
+ | Using [[UsesMethod::Naive Bayes]] conditional independence assumption, | ||
+ | |||
+ | :<math>p(word_i \vert entity,word_1,\dots,word_{i-1}) = \frac{1}{Z} p(word_i) p(entity|word_i) \prod_{j=1}^{i-1} p(word_j \vert word_i)</math> | ||
+ | |||
+ | where <math>Z</math> (the evidence) is the normalization factor. We then need to model how this probability changes over time. | ||
== Evaluation == | == Evaluation == | ||
− | + | Our project will be mainly quantitative analysis in nature. |
Latest revision as of 20:25, 14 February 2011
Social Media Analysis Project Ideas
Contents
Team Members
Derry Wijaya [dwijaya@cs.cmu.edu]
Reyyan Yeniterzi [reyyan@cs.cmu.edu]
Project Idea
Understanding change
Given an entity of interest, we would like to model and analyze its change in terms of words and phrases that co-occur with it over time. In other words, we would like to understand the change in Social Network Attribute over time, where the social network is defined over words.
We propose to construct a social network, but instead of people, we put words as nodes and edges are weighted based on the number of co-occurrences between the words. Using this social network of words, we propose to analyze:
(1) how co-occurrence with other words change over time
(2) how the change influences the state (semantic or sentiment) associated with the entity
(3) how the change may correspond to events that occur during the same period of time
For example, the entity 'BP' frequently co-occurred with negatively associated words during and after the Gulf-spill event.
Dataset
Motivation
The co-occurrence of words changes over time
It will be interesting to model this change and analyze:
(1) How the state (semantic or sentiment) of a given entity changes over time depending on its neighbors (i.e. co-occurring words/phrases)
(2) How such changes relate to events that occur in the same period of time
(3) Whether we can find a natural sequence of events that define a change of state (semantic or sentiment) of a given entity
(4) Whether we can use (3) to predict the change of state of a given entity
Techniques and Related Works
(1) Linear regression analysis to measure tendency of a word to become negative/positive in meaning over time, when co-occurred with negative/positive words
Related paper: Nicholas A. Christakis, M.D., Ph.D., M.P.H., and James H. Fowler, Ph.D. (2007) The Spread of Obesity in a Large Social Network over 32 Years: External Link - techniques to analyze the spread of obesity in a network of people
(2) Identify break points in the states of an entity (based on its co-occurrence changes) and find events that correspond to the break points
Related paper: Akcora et.al. (2010) Identifying Breakpoints in Public Opinion: External Link - technique to identify break points in sentiments found in tweets (Twitter), using a set of manually constructed emotion words (Vector space models)
Related paper: Michel et.al. (2010) Quantitative Analysis of Culture Using Millions of Digitized Books: External Link - measure usage frequency over time of a given n-gram (such as "slavery", "great war", etc) that represents an entity of interest
(3) Graph modeling of co-occurrences (where nodes are words and edges are weighted by the number of co-occurrences between words). Use of techniques from dynamic network evolution, link analysis or clustering to model and analyze changes in this graph over time
Related paper: Cemal Cagatay Bilgin and Bülent Yener (2010) Dynamic Network Evolution: Models, Clustering, Anomaly Detection: External Link
(4) For baseline, we plan to use Bayes' Law to measure probability that a word will co-occur with an entity, given the entity and other words that have co-occurred with the entity:
Using Naive Bayes conditional independence assumption,
where (the evidence) is the normalization factor. We then need to model how this probability changes over time.
Evaluation
Our project will be mainly quantitative analysis in nature.