Information Diffusion and External Influence in Networks

From Cohen Courses
Revision as of 11:25, 5 October 2012 by Kwmurray (talk | contribs)
Jump to navigationJump to search

This Paper is available online [1].

Summary

This paper is unique in that it investigates external influence as a factor of information spread in social networks, rather than just diffusion as most previous work had done. The authors propose a model which allows information to spread from one node to another in the graph according to a probabilistic model. Additionally, it allows for information to spread from outside of the graph network also from a probability distribution. Furthermore, even though a node has neighbors that may allow information to diffuse, it may still have a generative story that the information came from an external source. They evaluate their results on a dataset of Twitter - using every tweet from one month. Internal diffusion within the network can occur when one person tweets the same link as someone that he/she is following. External influence is when something enters the social graph, even if it is the same piece of information elsewhere in the graph if the tweeter does not directly follow them, or possibly if they happened to find the same information as someone they follow, but not through their tweets. Anything outside of twitter is considered external and the authors give examples such as CNN.com. I see a major issue with their method as their are "external" things that could still be part of a larger unobserved network (got information from a different social networking site such as Facebook). Their results where they say 29% of things cannot be explained by internal network effects could still be part of a larger social network and though external to Twitter, still have the impact of diminishing network effects.

Datasets

The authors use two different datasets including synthetic data (to establish a baseline), and links from tweets. The interesting thing that this paper has over many other papers dealing with twitter is that they have every tweet from January 2011, totaling over 3 billion.

Methodology

The authors refer to the amount of influence external sources have on the network as a function of time as the event profile. Contagion refers to any piece of information in the Twitter network and a node becomes infected when a tweeter tweets it. Contagions are modeled as independent (which I believe to be a plausible assumption, but probably not completely correct, though it is much easier to deal with an independence assumption). Exposure Variables.png

To model the internal exposure, the authors use Hazard Functions from actuarial sciences. They argue that the benefit is that they are effective with modeling discrete events that happen over continuous time. Each incoming node (person they follow) has a probability that a specific contagion hasn't been exposed to it yet. The expected number of exposures is the sum of the cumulative distribution of exposures. Exposure Curves.png

External exposures are modeled much more simply as they are not conditioned on other nodes in the network being infected (even if someone a tweeter follows is infected - they could be infected from outside the network). Event profiles have changing intensities over time. The authors talk a lot about news and say that as time passes from a event, the event profile's probability should go to zero, but have a caveat that new developments can cause a spike. They model this as a Bernoulli Random Variable and infer it non-parametrically.

DiffusionAndExternal.png

Experimental Results

The experiments on twitter suggest that only about 71% of URL mentions on twitter come from network effects, the rest is explained by their external influences. They argue that this 29% is statistically significant and cannot be ignored. In other words, a proper treatment of a social network should also allow for external influences in addition to evaluations of just the network.

Related Papers

Study Plan

The seminal work on Diffusion of Innovations: [2]