Yang et al Modeling Information Diffusion in Implicit Networks

From Cohen Courses
Revision as of 18:07, 1 October 2012 by Mmohta (talk | contribs)
Jump to navigationJump to search

This is a Paper summarized for the course Analysis of Social Media 10-802 in Fall 2012.

NOTE: WORK IN PROGRESS!

Citation

Yang, J., and Leskovec, J. 2010. Modeling Information Diffusion in Implicit Networks.

Online version

Modeling Information Diffusion in Implicit Networks

Summary

The paper proposes a Linear Influence Model in which rather than trying to predict which nodes influence other nodes based on the network structure, it tries to model the global influence of a node on the rate of diffusion of information. The authors propose a way to model influence function for each node. An influence function captures the effect of the node on the spread of the contagion at different times. Since, different types of nodes (for example blogs, news website, etc.) may have different influence function, the paper proposes a non-parametric model for influence functions capable of capturing such varied behaviors instead of same function for each node with different (estimated) parameters.

Main Ideas

Motivation

When trying to model influence using the Network structure, many assumptions are made like complete network data is available, the structure of network is sufficient to explain observed behavior and contagion can spread only through the edges. However, the paper, points out that there may be many external and hidden factors for which the data is not readily available. Also, in case of information / virus propagation, the source is not known. For example, in case of information propagation, people usually discover new information without explicitly acknowledging the source. Thus, the authors argue that existing models for diffusion based on network characteristics may be too constrained and a need for a global influence model.

Linear Influence Model

The main idea in this model is that each node has an influence function associated with it and the number of newly infected nodes at time t is a function of influences of nodes that got infected before time t. The paper defines the following terms:

  • V(t): Number of nodes that mention the information at time t (Volume at time t).
  • Iu(l): number of followup mentions l time units after node u adopted the information. Instead of modeling it as a parametric function, it is modeled as a vector of length L. Hence, an assumption is made that L time units after the node u got activates, the influence drops to zero.
  • A(t): Denotes the set of active (infected, influenced) nodes u (got activated before t)
  • Time t: Is measured as a discrete set of values (generally an hour for most experiments.)
  • Mu,k(t): Indicator function, equals 1 if node u got infected by contagion k at time t, and 0 otherwise

Relation between V(t+1) and Influence functions can be written in following 2 ways

- V(t+1) =  Iu(t - tu) 
- V(t+1) =  Mu,k(t - l) * Iu(l + 1) 


Experiments

DataSets / Resources

Related papers

Study plan