# Latent Social Network Influence

This is one of the paper written in course Social Media Analysis 10-802 in Spring 2011

## Citation

S. A. Myers & J. Leskovec "On the Convexity of Latent Social Network Inference" In Neural Information Processing Systems (NIPS), 2010.

## Summary

This is an interesting paper of learning the graph structure which is not a typical Bayesian graph learning. The main problem that the authors try to solve is given a set of nodes and network diffusion, Graph Learning, learning the graph over which the contagion spreads. The assumption is information diffusion often happens over an unknown network. Typically we observe the nodes getting infected and the time they get infected, however we cannot know who infected them. The infection can here can be blogs getting posted with a similar news articles, or people buying a new product or it can be a typical disease that is spreading on geographic or people network. This problem is converted to a maximum likelihood based convex programming problem and further they introduce an ${\displaystyle l_{1}}$ penalty to penalize the graph with larger number of nodes.

## Description of the method

Assume that there are ${\displaystyle N}$ nodes in the graph and we present a ${\displaystyle NxN}$ matrix A where

${\displaystyle A_{ij}=Pr(nodeiinfectsnodej|nodeiisinfected)}$

A cascade is based on transmission model ${\displaystyle w(t)}$ and recovery model ${\displaystyle r(t)}$. Transmission model defines how much time it takes for an infection to spread from one node to another and recovery model defines how long a node remains infected before it recovers. Further time when each node ${\displaystyle i}$ is infected is given by ${\displaystyle \tau _{i}}$. With a ${\displaystyle D}$ cascades given and a cascade we try to get MLE of the entries of the matrix A.

The following is likelihood of probability matrix A to given set of cascades. Here ${\displaystyle X_{c}(t)}$ is set of nodes that got infected at the time t and ${\displaystyle \tau _{i}^{c}=\infty }$ if the node i is never infected in the cascade c.

${\displaystyle L(A;D)=\prod _{c\in D}\left\lbrack \left\lbrace \prod _{i;\tau _{i}^{c}<\infty }P(i{\mbox{ infected at }}\tau _{i}^{c}|X_{c}(\tau _{i}^{c}))\right\rbrace \left\lbrace \prod _{i;\tau _{i}^{c}=\infty }P(i{\mbox{ never infected }}|X_{c}(t)\forall t)\right\rbrace \right\rbrack }$

${\displaystyle L(A;D)=\prod _{c\in D}\left\lbrack \left\lbrace \prod _{i;\tau _{i}^{c}<\infty }\left(1-\prod _{j;\tau _{j}\leq \tau _{i}}(1-w(\tau _{i}^{c}-\tau _{j}^{c})A_{ij}\right)\right\rbrace \left\lbrace \prod _{i;\tau _{i}^{c}=\infty }\prod _{j;\tau _{j}^{c}<\infty }(1-A_{ij})\right\rbrace \right\rbrack }$

The problem is ${\displaystyle max_{A}log(L(A;D))}$ which involves finding out at least ${\displaystyle N(N-1)}$ assuming each node is not connected to itself and we learn an undirected graph. Further, assuming that incoming edges of a given node is independent of incoming edges to any other node, so the columns of matrix A are inferred independently so we have linear number of parameters to estimate.

The problem is then converted to a convex programming problem and then the ${\displaystyle l_{1}}$-penality is introduced based on the assumption that each node is connected to constant number of other nodes independent on total number of nodes in the graph.

## Datasets used

They use three different datasets in the paper.

• Synthetic dataset on 512 nodes and 1024 edges, with the Susceptible–Infected (SI) contagion model
• Collaboration network between 379 scientists.
• Email social network of 593 nodes and 2824 edges that was based on the email of a small European research institute.
• Recommendation network of 275 users with 1522 edges and a set of 5,765 recommendations on 625 products.

## Experimental Results

Using synthetic dataset, this method compares with the previous paper NetInf, by same author Jure Leskovec. The comparison is based on precision and recall rates and Mean Squared Error (MSE). The MSE is taken over the union of potential edge positions (node pairs) where there is an edge in the latent network, and the edge positions in which the algorithm has predicted the presence of an edge. It shows that the current method is consistently better than NetInf method both interms of precision, recall and also in terms of MSE.

They show using rest of the datasets that their method is capable of near perfect recovery of the underlying social network with relatively small number of cascades given.

## Related Papers

The same author proposed an approximation algorithm called NetInf which assumes that the weights of the edges in latent network are homogeneous. When this assumption holds, the algorithm is very accurate and is computationally feasible, however does not address the generic problem.