Fast and scalable algorithms for semi-supervised link prediction on static and dynamic graphs

From Cohen Courses
Jump to navigationJump to search

Citation

 title={Fast and scalable algorithms for semi-supervised link prediction on static and dynamic graphs},
 author={Raymond, R. and Kashima, H.},
 journal={Machine Learning and Knowledge Discovery in Databases},
 pages={131--147},
 year={2010},
 publisher={Springer}

Online version

Fast and scalable algorithms for semi-supervised link prediction on static and dynamic graphs

Summary

This paper addresses the problem of [AddressesProblem::Link Prediction]], i.e. the completion of missing links between nodes in a Graph; essentially, this paper is a followup/extension of a previous work done by a subset the authors ( Link propagation: A fast semi-supervised learning algorithm for link prediction) in which, a technique called "Link Propagation" is introduced as one solution to that problem. The present paper points out that the original algorithm for Link Propagation is expensive both in terms of computational and space complexity, thereby hard to scale to large datasets. To that end, the authors introduce approximate techniques which result in the faster computation of Link Propagation.

The original algorithm of Link Propagation relies on the exact computation of the eigen-decomposition of the underlying Laplacian matrix of the Graph whose links are under completion. In order to avoid the compuational and space overhead induced by the eigen-decomposition, the authors propose to approximate it by low rank decompositions of the Laplacian matrix. As an example, the authors mention the Cholesky decomposition of the positive semi-definite Laplacian but comment that any low rank approximation will perform as well.


Datasets

In order to evaluate the performance of their algorithm, the authors use different portions of the Web Spam Challenge dataset. In particular, they extract Graphs of varying sizes from this dataset

Experimental Evaluation

The authors evaluate their algorithm with respect to

  • The time complexity
  • The accuracy of the actual link prediction

For the accuracy of the prediction, the authors measure the Area Under Curve (AUC) of the ROC curve. In the figure below there is a summary of their experimental results, where "Exact" is the original algorithm and "M = k" is the approximate algorithm using low rank equal to k for the approximation.

Link propagation.png

Discussion

One surprising aspect of this paper (which the authors also point out in the introduction) is the fact that sometimes the approximate algorithm outperforms the exact one, which is impressive. This also gives food for thought, whether the "approximate" definition of Link Propagation is indeed the right one, since by doing the low rank approximation, latent structure of the graph is captured in the process, and the performance should increase.


Study Plan

The paper is pretty much stand-alone, although one could skim through the previous work of the authors ( Link propagation: A fast semi-supervised learning algorithm for link prediction) in order to get a grasp of the original algorithm upon which they improve. However, the present paper covers the introductory material adequately, despite the size limitations of the particular conference.