Difference between revisions of "Link propagation: A fast semi-supervised learning algorithm for link prediction"
(3 intermediate revisions by the same user not shown) | |||
Line 11: | Line 11: | ||
== Summary == | == Summary == | ||
− | This | + | This [[Category::Paper]] adopts a semi-supervised algorithm known as "label propagation" and successfully transfers the original one which focus on node labeling to solving [[AddressesProblem::Link Prediction]] task. It also solves the problem of combining topological data and node information in the prediction algorithm. |
== Key Ideas == | == Key Ideas == | ||
*Mapping between node labeling and link prediction. | *Mapping between node labeling and link prediction. | ||
[http://www.aladdin.cs.cmu.edu/papers/pdfs/y2003/zgl.pdf Label Propagation] is a semi-supervised algorithm which successfully combines topological features and node information in predicting nodes' label(assigning a label for each node on a graph). And the paper we are talking about find a mapping between node labeling and link prediction by treating each link as a triplet <math>(x_i, y_j, z_t)</math>, where <math>x_i</math> and <math>y_j</math> represent two nodes and <math>z_t</math> represents a type of link. As the method of this paper also supports the prediction of multiple type of link, if there is only one type of link, we can just set <math>z_t</math> to 1 constantly. And we treat this triplet as one node and consider a problem of assigning two possible label, 0 or 1, for each this triplet and 1 stands for this link exists and 0 otherwise. | [http://www.aladdin.cs.cmu.edu/papers/pdfs/y2003/zgl.pdf Label Propagation] is a semi-supervised algorithm which successfully combines topological features and node information in predicting nodes' label(assigning a label for each node on a graph). And the paper we are talking about find a mapping between node labeling and link prediction by treating each link as a triplet <math>(x_i, y_j, z_t)</math>, where <math>x_i</math> and <math>y_j</math> represent two nodes and <math>z_t</math> represents a type of link. As the method of this paper also supports the prediction of multiple type of link, if there is only one type of link, we can just set <math>z_t</math> to 1 constantly. And we treat this triplet as one node and consider a problem of assigning two possible label, 0 or 1, for each this triplet and 1 stands for this link exists and 0 otherwise. | ||
− | *Use Conjugate | + | *Use [[UsesMethod::Conjugate Gradient Method]] to solve the minimization problem |
== The Idea of The Original Node Labeling Algorithm== | == The Idea of The Original Node Labeling Algorithm== | ||
Line 42: | Line 42: | ||
== Data sets == | == Data sets == | ||
The paper use three data sets. | The paper use three data sets. | ||
− | *The metabolic pathways of the yeast S. Cerevisiae in the KEGG/PATHWAY database | + | *The metabolic pathways of the yeast S. Cerevisiae in the [[UsesDataset::KEGG/PATHWAY database]] |
[[paper::Y. Yamanishi, J.-P. Vert, and M. Kanehisa. Supervised enzyme network inference from the integration of genomic data and chemical information. Bioinformatics, 21:i468–i477, 2005]] | [[paper::Y. Yamanishi, J.-P. Vert, and M. Kanehisa. Supervised enzyme network inference from the integration of genomic data and chemical information. Bioinformatics, 21:i468–i477, 2005]] | ||
*A protein-protein interaction network data set | *A protein-protein interaction network data set | ||
[[paper::C. von Mering, R. Krause, B. Snel, M. Cornell, S. Olivier,S. Fields, and P. Bork. Comparative assessment of large-scale data sets of protein-protein interactions. Nature, 417:399–403, 2002]] | [[paper::C. von Mering, R. Krause, B. Snel, M. Cornell, S. Olivier,S. Fields, and P. Bork. Comparative assessment of large-scale data sets of protein-protein interactions. Nature, 417:399–403, 2002]] | ||
− | * | + | *[[UsesDataset::NIPS 1-17 data]] is a social network representing the co-authorships in the NIPS conferences, available [http://ai.stanford.edu/~gal/data.html here] |
== Related papers == | == Related papers == |
Latest revision as of 02:59, 4 October 2012
Link propagation: A fast semi-supervised learning algorithm for link prediction
Contents
Citation
Kashima H., Kato T. , Yamanishi Y. , Sugiyama M and Tsuda K (May-2009) Link Propagation: A Fast Semi-supervised Learning Algorithm for Link Prediction In: SDM 2009, 2009 SIAM International Conference on Data Mining, Philadelphia, PA, USA, Society for Industrial and Applied Mathematics, 1099-1110.
Online version
Summary
This Paper adopts a semi-supervised algorithm known as "label propagation" and successfully transfers the original one which focus on node labeling to solving Link Prediction task. It also solves the problem of combining topological data and node information in the prediction algorithm.
Key Ideas
- Mapping between node labeling and link prediction.
Label Propagation is a semi-supervised algorithm which successfully combines topological features and node information in predicting nodes' label(assigning a label for each node on a graph). And the paper we are talking about find a mapping between node labeling and link prediction by treating each link as a triplet , where and represent two nodes and represents a type of link. As the method of this paper also supports the prediction of multiple type of link, if there is only one type of link, we can just set to 1 constantly. And we treat this triplet as one node and consider a problem of assigning two possible label, 0 or 1, for each this triplet and 1 stands for this link exists and 0 otherwise.
- Use Conjugate Gradient Method to solve the minimization problem
The Idea of The Original Node Labeling Algorithm
- The goal is to derive a for every node which means the probability for this node to have a label "1". And function has to agree on the training data which we already know the labels.
- With an input of similarity matrix of nodes and the base assumption is that similar nodes should have similar labels
- Have a cost function on the assumption and the goal is to find a which minimize the loss function
- Derive that the minimum loss labels are just a weighted average of neighbor points
Some Modifications
- The loss function is changed to
- Differentiate the above function over and apply conjugate gradient method to solve the equation after differentiation.
Data sets
The paper use three data sets.
- The metabolic pathways of the yeast S. Cerevisiae in the KEGG/PATHWAY database
- A protein-protein interaction network data set
- NIPS 1-17 data is a social network representing the co-authorships in the NIPS conferences, available here
Related papers
There is a later paper which is based on this one and tries to improve the efficiency of running time: Fast and scalable algorithms for semi-supervised link prediction on static and dynamic graphs.
The paper talks about the node label algorithm is Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions
Study Plan
- To understand the original node labeling algorithm
- Wiki Page Conjugate gradient method