Inferring Networks of Diffusion and Influence
Contents
Team Members
Zaid Sheikh
Project Idea
[Project idea taken from: http://www.cs.cmu.edu/afs/.cs.cmu.edu/Web/People/epxing/Class/10701/project.html ]
Information diffusion and virus propagation are fundamental processes taking place in networks. In many applications, the underlying network over which the diffusions and propagations spread is hard to find. Finding such underlying network using MemeTracker data would be an interesting and challenging project. Gomez-Rodriguez et al. (2010) have recently published a paper on this topic, and made their code publically accessible. In this project, we would first like to replicate their results.
Furthermore, the algorithm proposed in the above paper (called NetInf) assumes that all connected nodes in the network influence their neighbors with the same probability. We would like to improve on this by observing how meme phrases mutate over time and using this information to more accurately estimate the influence probabilities.
Data
The dataset used by NETINF is called MemeTracker. It can be downloaded from http://memetracker.org/data.html .
MemeTracker contains two datasets. The first one is a phrase cluster data. For each phrase cluster the data contains all the phrases in the cluster and a list of URLs where the phrases appeared. The second is the raw MemeTracker phrase data, which contains phrases and hyper-links extracted from each article/blogpost.