Difference between revisions of "Supervised Random Walks: Predicting and Recommending Links in Social Networks"

From Cohen Courses
Jump to navigationJump to search
(Created page with 'Supervised Random Walks: Predicting and Recommending Links in Social Networks == Citation == Backstrom, Lars and Leskovec, Jure Supervised random walks: predicting and recommen…')
 
 
(4 intermediate revisions by the same user not shown)
Line 11: Line 11:
 
== Summary ==
 
== Summary ==
  
This paper tries to solve the dilemma for link prediction between unsupervised methods which are sometimes too heuristic and lead to uncontrollable performance and some supervised machine learning approaches which have great challenge in scalability.
+
This paper tries to solve the dilemma for [[AddressesProblem::link prediction]] between unsupervised methods which are sometimes too heuristic and lead to uncontrollable performance and some supervised machine learning approaches which have great challenge in scalability.
  
 
The key idea of this paper is to adopt a random walk framework and train the transition probability to generate a higher visiting frequency on the positive nodes than negative nodes in the training set. From the trained transition probability, new nodes with high visiting frequency is the output as promising new links.
 
The key idea of this paper is to adopt a random walk framework and train the transition probability to generate a higher visiting frequency on the positive nodes than negative nodes in the training set. From the trained transition probability, new nodes with high visiting frequency is the output as promising new links.
Line 17: Line 17:
 
== Optimization Function ==
 
== Optimization Function ==
 
For each edge <math>(u, v)</math>, there is a corresponding feature vector <math>\psi_{uv}</math> that describes the nodes feature (age, gender, etc.) and the interaction attributes (e.g., when the edge was created). And we have a function <math>f_w(\psi_{uw})</math> which as parameterized by <math>w</math> and computes the transition probability for <math>(u, v)</math>. Considering starting from a single node s and <math>p_i</math> means the probability of random walk from s to stop at i with all the transition probabilities. And the optimization problem now becomes:
 
For each edge <math>(u, v)</math>, there is a corresponding feature vector <math>\psi_{uv}</math> that describes the nodes feature (age, gender, etc.) and the interaction attributes (e.g., when the edge was created). And we have a function <math>f_w(\psi_{uw})</math> which as parameterized by <math>w</math> and computes the transition probability for <math>(u, v)</math>. Considering starting from a single node s and <math>p_i</math> means the probability of random walk from s to stop at i with all the transition probabilities. And the optimization problem now becomes:
 +
 
<math>
 
<math>
 
\min_w{F(w)} = ||w||^2 + \lambda \sum_{d \in D, l \in L}{p_l - p_d}
 
\min_w{F(w)} = ||w||^2 + \lambda \sum_{d \in D, l \in L}{p_l - p_d}
 
</math>
 
</math>
 +
 
where D is the positive training sets and L is negative. <math>||w||^2</math> is to overcome the over fitting problem.
 
where D is the positive training sets and L is negative. <math>||w||^2</math> is to overcome the over fitting problem.
  
Line 31: Line 33:
  
 
== Related papers ==
 
== Related papers ==
Most link prediction solutions can fall to two directories, one is to use topological features and the other is to use node or edge information.
+
[[RelatedPaper::Liben-Nowell Kleinberg J. Am.Soc.Inf.Sci.2007]] studies extensively on the unsupervised topological features for link prediction
 
 
[[RelatedPaper::Link propagation: A fast semi-supervised learning algorithm for link prediction]] suggests a semi-supervised framework which need a node similarity matrix as input. Many applications like protein-protein interactions and social networks are able to provide this information.
 
  
[[RelatedPaper::Supervised Random Walks: Predicting and Recommending Links in Social Networks]] adopts a random walk framework and use edge and node information as features to train a transition rate over each edge to maximize the probability of random walk stopping time on positive nodes.
+
[[RelatedPaper::Fast dynamic reranking in large graphs]] and  [[RelatedPaper::Link precition in relational data]] discuss some supervised frameworks for link prediction.
  
 
== Study Plan ==
 
== Study Plan ==
* About Adamic/Adar score
+
* Slides about personalized random walk
**[[Paper::Adamic, L.A., & Adar, E. (2003). Friends and neighbors on the Web. Social Networks, 25(3), 211–230]]
+
[http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CCIQFjAA&url=http%3A%2F%2Fwww.cs.cmu.edu%2F~learning%2Ftalks-2007-fall%2Fslides%2F2007-10-01.sarkar.ppt&ei=tG1qUKjZHrOq0AHIz4GYAQ&usg=AFQjCNGtWS-dG8o13vX27O7KGZJZd5IkVw&sig2=bRfZf474vR00llGkZ63Dcg Random Walks on Graphs: An Overview]
* To understand which graph information these features represent
+
* Gradient descent based method
**[[Paper::Davidsen, J., Ebel, H., & Bornholdt, S. (2002). Emergence of a small world from local interactions: Modeling acquaintance networks. Physical Review Letters, 88(128701)]]
+
** [http://www.math.oregonstate.edu/~gibsonn/optpart2.pdf Slides]
 +
** [http://en.wikipedia.org/wiki/Gradient_descent Wiki]

Latest revision as of 02:37, 4 October 2012

Supervised Random Walks: Predicting and Recommending Links in Social Networks

Citation

Backstrom, Lars and Leskovec, Jure Supervised random walks: predicting and recommending links in social networks WSDM '11

Online version

PDF

Summary

This paper tries to solve the dilemma for link prediction between unsupervised methods which are sometimes too heuristic and lead to uncontrollable performance and some supervised machine learning approaches which have great challenge in scalability.

The key idea of this paper is to adopt a random walk framework and train the transition probability to generate a higher visiting frequency on the positive nodes than negative nodes in the training set. From the trained transition probability, new nodes with high visiting frequency is the output as promising new links.

Optimization Function

For each edge , there is a corresponding feature vector that describes the nodes feature (age, gender, etc.) and the interaction attributes (e.g., when the edge was created). And we have a function which as parameterized by and computes the transition probability for . Considering starting from a single node s and means the probability of random walk from s to stop at i with all the transition probabilities. And the optimization problem now becomes:

where D is the positive training sets and L is negative. is to overcome the over fitting problem.

Data Sets and Experiments

  • Synthetic data

The paper generates synthetic graphs to test whether the algorithm can successfully recover if the graph is generated using the random walk way. And the result shows almost 100% accuracy with noise-free data and a still high accuracy with some noise.

  • Four co-authorship networks from arXiv e-print archive.

The AUC score is 0.71238 and the best unsupervised method gets 0.638 and best other supervised method gets 0.674.

  • Facebook data of Iceland

The AUC score is 0.828 and unsupervised random walk(equal weighted) gets an AUC score 0.817 and best supervised method gets 0.817

Related papers

Liben-Nowell Kleinberg J. Am.Soc.Inf.Sci.2007 studies extensively on the unsupervised topological features for link prediction

Fast dynamic reranking in large graphs and Link precition in relational data discuss some supervised frameworks for link prediction.

Study Plan

  • Slides about personalized random walk

Random Walks on Graphs: An Overview