Difference between revisions of "Compare Link Propagation Papers"

From Cohen Courses
Jump to navigationJump to search
 
Line 16: Line 16:
 
*A dataset derived from the following paper
 
*A dataset derived from the following paper
 
**[[paper::C. von Mering, R. Krause, B. Snel, M. Cornell, S. Olivier,S. Fields, and P. Bork. Comparative assessment of large-scale data sets of protein-protein interactions. Nature, 417:399–403, 2002]]
 
**[[paper::C. von Mering, R. Krause, B. Snel, M. Cornell, S. Olivier,S. Fields, and P. Bork. Comparative assessment of large-scale data sets of protein-protein interactions. Nature, 417:399–403, 2002]]
*A social network dataset: [http://ai.stanford.edu/~gal/data.html here]
+
*A social network dataset found [http://ai.stanford.edu/~gal/data.html here]
  
 
The second paper uses multiple versions (of varying size) of the [http://webspam.lip6.fr/wiki/pmwiki.php?n=Main.PhaseIITrainningCorpora Web Spam Challenge] dataset.
 
The second paper uses multiple versions (of varying size) of the [http://webspam.lip6.fr/wiki/pmwiki.php?n=Main.PhaseIITrainningCorpora Web Spam Challenge] dataset.

Latest revision as of 12:34, 5 November 2012

Papers

Link propagation: A fast semi-supervised learning algorithm for link prediction , Kashima et al., SDM 2009

Fast and scalable algorithms for semi-supervised link prediction on static and dynamic graphs , Raymond et al. ECML-PKDD 2010.

Method

The first paper is essentially a Tensor Completion problem, where the algorithm of "Link Propagation" which is a means of doing link prediction between nodes of a Graph, using the eigen-decomposition of its Laplacian, is extended to the case where we have a dynamic graph, which can be represented as a tensor.

The second paper is an extension fo the first, in the sense that it builds up on the previous algoritm, introducing an approximate version thereof which operates faster and its performance is on par with the original one. The approximate algorithm relies on Matrix Decompositions, such as the Cholesky decomposition/factorization, and in particular its main idea is the approximation of the Laplacian of the Graph in a lower rank.

Dataset Used

The first paper uses a relatively large variety of different datasets, i.e.:

The second paper uses multiple versions (of varying size) of the Web Spam Challenge dataset.

Problem

The problem in the first paper is merely the one of predicting missing links between nodes, in the setting of a time evolving graph which can be represented as a tensor.

However, this is not the main focus of the second paper, even though it builds up on the same algorithm. The second paper is merely concerned with scaling up "Link Propagation" by introducing approximation in the computations, therefore it is more algorithmic oriented.

Big Idea

The big idea of the first paper is the utilization of auxiliary information in order to aid tensor completion (i.e. link prediction of missing links in a time evolving graph)

The big idea of the second paper is the scaling up of the Link Propagation algorithm both in the matrix (static graph) and tensor (dynamic graph) case, by introducing approximate steps in the computation process.

Other

Both papers seem to be complementary of each other and of very high technical quality. The experimental evaluation of the first paper appears to be more thorough than the one in the second paper, although this could be from the fact that the SDM conference page limit is higher than the ECML-PKDD one (and the page margins of ECML-PKDD are not very helpful for that either).

Questions

  • How much time did you spend reading the (new, non-wikified) paper you summarized?
    • 1.5 hour
  • How much time did you spend reading the old wikified paper?
    • 45 minutes
  • How much time did you spend reading the summary of the old paper?
    • 15 minutes
  • How much time did you spend reading background material?
    • 30 minutes
  • Was there a study plan for the old paper?
    • Yes
  • If so, did you read any of the items suggested by the study plan? and how much time did you spend with reading them?
    • No.