Difference between revisions of "Fast and scalable algorithms for semi-supervised link prediction on static and dynamic graphs"

From Cohen Courses
Jump to navigationJump to search
Line 14: Line 14:
 
== Summary ==
 
== Summary ==
  
This paper addresses the problem of [AddressesProblem::Link  Prediction]], i.e. the completion of missing links between nodes in a Graph; essentially, this paper is a followup/extension of a previous work done by a subset the authors ( [[Link propagation: A fast semi-supervised learning algorithm for link prediction]])  in which this
+
This paper addresses the problem of [AddressesProblem::Link  Prediction]], i.e. the completion of missing links between nodes in a Graph; essentially, this paper is a followup/extension of a previous work done by a subset the authors ( [[Link propagation: A fast semi-supervised learning algorithm for link prediction]])  in which, a technique called
"Link Propagation" is introduced as one solution to that problem.
+
"Link Propagation" is introduced as one solution to that problem. The present paper points out that the original algorithm for Link Propagation is expensive both in terms of computational and space complexity, thereby hard to scale to large datasets. To that end, the authors introduce approximate techniques which result in the faster computation of Link Propagation.
 +
 
 +
The original algorithm of Link Propagation relies on the exact computation of the eigen-decomposition of the underlying Laplacian matrix of the Graph whose links are under completion. In order to avoid the compuational and space overhead induced by the eigen-decomposition, the authors propose to approximate it by low rank decompositions of the Laplacian matrix. As an example, the authors mention the Cholesky decomposition of the positive semi-definite Laplacian but comment that any low rank approximation will perform as well.
 +
 
 +
 
 +
== Datasets ==
 +
In order to evaluate the performance of their algorithm, the authors use different portions of the [http://webspam.lip6.fr/wiki/pmwiki.php?n=Main.PhaseIITrainningCorpora Web Spam Challenge] dataset. In particular, they extract Graphs of varying sizes from this dataset
 +
 
 +
== Experimental Evaluation ==
 +
The authors evaluate their algorithm with respect to
 +
* The time complexity
 +
* The accuracy of the actual link prediction
 +
 
 +
For the accuracy of the prediction, the authors measure the Area Under Curve (AUC) of the ROC curve. In the figure below there is a summary of their experimental results, where "Exact" is the original algorithm and "M = k" is the approximate algorithm using low rank equal to k for the approximation.
 +
 
 +
[[File:Link propagation.png]]

Revision as of 00:16, 5 November 2012

Citation

 title={Fast and scalable algorithms for semi-supervised link prediction on static and dynamic graphs},
 author={Raymond, R. and Kashima, H.},
 journal={Machine Learning and Knowledge Discovery in Databases},
 pages={131--147},
 year={2010},
 publisher={Springer}

Online version

Fast and scalable algorithms for semi-supervised link prediction on static and dynamic graphs

Summary

This paper addresses the problem of [AddressesProblem::Link Prediction]], i.e. the completion of missing links between nodes in a Graph; essentially, this paper is a followup/extension of a previous work done by a subset the authors ( Link propagation: A fast semi-supervised learning algorithm for link prediction) in which, a technique called "Link Propagation" is introduced as one solution to that problem. The present paper points out that the original algorithm for Link Propagation is expensive both in terms of computational and space complexity, thereby hard to scale to large datasets. To that end, the authors introduce approximate techniques which result in the faster computation of Link Propagation.

The original algorithm of Link Propagation relies on the exact computation of the eigen-decomposition of the underlying Laplacian matrix of the Graph whose links are under completion. In order to avoid the compuational and space overhead induced by the eigen-decomposition, the authors propose to approximate it by low rank decompositions of the Laplacian matrix. As an example, the authors mention the Cholesky decomposition of the positive semi-definite Laplacian but comment that any low rank approximation will perform as well.


Datasets

In order to evaluate the performance of their algorithm, the authors use different portions of the Web Spam Challenge dataset. In particular, they extract Graphs of varying sizes from this dataset

Experimental Evaluation

The authors evaluate their algorithm with respect to

  • The time complexity
  • The accuracy of the actual link prediction

For the accuracy of the prediction, the authors measure the Area Under Curve (AUC) of the ROC curve. In the figure below there is a summary of their experimental results, where "Exact" is the original algorithm and "M = k" is the approximate algorithm using low rank equal to k for the approximation.

Link propagation.png