Link propagation: A fast semi-supervised learning algorithm for link prediction

Citation

Kashima H., Kato T. , Yamanishi Y. , Sugiyama M and Tsuda K (May-2009) Link Propagation: A Fast Semi-supervised Learning Algorithm for Link Prediction In: SDM 2009, 2009 SIAM International Conference on Data Mining, Philadelphia, PA, USA, Society for Industrial and Applied Mathematics, 1099-1110.

Online version

PDF

Summary

This Paper adopts a semi-supervised algorithm known as "label propagation" and successfully transfers the original one which focus on node labeling to solving Link Prediction task. It also solves the problem of combining topological data and node information in the prediction algorithm.

Key Ideas

Mapping between node labeling and link prediction.

Label Propagation is a semi-supervised algorithm which successfully combines topological features and node information in predicting nodes' label(assigning a label for each node on a graph). And the paper we are talking about find a mapping between node labeling and link prediction by treating each link as a triplet $(x_{i},y_{j},z_{t})$ , where $x_{i}$ and $y_{j}$ represent two nodes and $z_{t}$ represents a type of link. As the method of this paper also supports the prediction of multiple type of link, if there is only one type of link, we can just set $z_{t}$ to 1 constantly. And we treat this triplet as one node and consider a problem of assigning two possible label, 0 or 1, for each this triplet and 1 stands for this link exists and 0 otherwise.

Use Conjugate Gradient Method to solve the minimization problem

The Idea of The Original Node Labeling Algorithm

The goal is to derive a $f(x)$ for every node $x$ which means the probability for this node to have a label "1". And $f$ function has to agree on the training data which we already know the labels.
With an input of similarity matrix of nodes $w_{i,j}$ and the base assumption is that similar nodes should have similar labels
Have a cost function on the assumption and the goal is to find a $f$ which minimize the loss function $E(f)$