Difference between revisions of "Supervised Random Walks: Predicting and Recommending Links in Social Networks"

Revision as of 00:24, 2 October 2012

Supervised Random Walks: Predicting and Recommending Links in Social Networks

Citation

Backstrom, Lars and Leskovec, Jure Supervised random walks: predicting and recommending links in social networks WSDM '11

Online version

PDF

Summary

This paper tries to solve the dilemma for link prediction between unsupervised methods which are sometimes too heuristic and lead to uncontrollable performance and some supervised machine learning approaches which have great challenge in scalability.

The key idea of this paper is to adopt a random walk framework and train the transition probability to generate a higher visiting frequency on the positive nodes than negative nodes in the training set. From the trained transition probability, new nodes with high visiting frequency is the output as promising new links.

Optimization Function

For each edge $(u,v)$ , there is a corresponding feature vector $\psi _{uv}$ that describes the nodes feature (age, gender, etc.) and the interaction attributes (e.g., when the edge was created). And we have a function $f_{w}(\psi _{uw})$ which as parameterized by $w$ and computes the transition probability for $(u,v)$ . Considering starting from a single node s and $p_{i}$ means the probability of random walk from s to stop at i with all the transition probabilities. And the optimization problem now becomes:

$\min _{w}{F(w)}=||w||^{2}+\lambda \sum _{d\in D,l\in L}{p_{l}-p_{d}}$

where D is the positive training sets and L is negative. $||w||^{2}$ is to overcome the over fitting problem.

Data Sets and Experiments

Synthetic data

The paper generates synthetic graphs to test whether the algorithm can successfully recover $w$ if the graph is generated using the random walk way. And the result shows almost 100% accuracy with noise-free data and a still high accuracy with some noise.

Four co-authorship networks from arXiv e-print archive.

The AUC score is 0.71238 and the best unsupervised method gets 0.638 and best other supervised method gets 0.674.

Facebook data of Iceland

The AUC score is 0.828 and unsupervised random walk(equal weighted) gets an AUC score 0.817 and best supervised method gets 0.817

Related papers

Most link prediction solutions can fall to two directories, one is to use topological features and the other is to use node or edge information.

Link propagation: A fast semi-supervised learning algorithm for link prediction suggests a semi-supervised framework which need a node similarity matrix as input. Many applications like protein-protein interactions and social networks are able to provide this information.

Supervised Random Walks: Predicting and Recommending Links in Social Networks adopts a random walk framework and use edge and node information as features to train a transition rate over each edge to maximize the probability of random walk stopping time on positive nodes.

Study Plan

About Adamic/Adar score
- Adamic, L.A., & Adar, E. (2003). Friends and neighbors on the Web. Social Networks, 25(3), 211–230
To understand which graph information these features represent
- Davidsen, J., Ebel, H., & Bornholdt, S. (2002). Emergence of a small world from local interactions: Modeling acquaintance networks. Physical Review Letters, 88(128701)

@@ Line 17: / Line 17: @@
 == Optimization Function ==
 For each edge <math>(u, v)</math>, there is a corresponding feature vector <math>\psi_{uv}</math> that describes the nodes feature (age, gender, etc.) and the interaction attributes (e.g., when the edge was created). And we have a function <math>f_w(\psi_{uw})</math> which as parameterized by <math>w</math> and computes the transition probability for <math>(u, v)</math>. Considering starting from a single node s and <math>p_i</math> means the probability of random walk from s to stop at i with all the transition probabilities. And the optimization problem now becomes:
 <math>
 \min_w{F(w)} = ||w||^2 + \lambda \sum_{d \in D, l \in L}{p_l - p_d}
 </math>
 where D is the positive training sets and L is negative. <math>||w||^2</math> is to overcome the over fitting problem.

Difference between revisions of "Supervised Random Walks: Predicting and Recommending Links in Social Networks"

Revision as of 00:24, 2 October 2012

Contents

Citation

Online version

Summary

Optimization Function

Data Sets and Experiments

Related papers

Study Plan

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools