Leskovec et al., WWW 2010

From Cohen Courses
Revision as of 20:43, 26 September 2012 by Bliu1 (talk | contribs)
Jump to navigationJump to search

Citation

Jure Leskovec, Kevin J. Lang, and Michael Mahoney. 2010. Empirical comparison of algorithms for network community detection. In Proceedings of the 19th international conference on World wide web (WWW '10). ACM, New York, NY, USA, 631-640. DOI=10.1145/1772690.1772755 http://doi.acm.org/10.1145/1772690.1772755

Abstract from the paper

Detecting clusters or communities in large real-world graphs such as large social or information networks is a problem of considerable interest. In practice, one typically chooses an objective function that captures the intuition of a network cluster as set of nodes with better internal connectivity than external connectivity, and then one applies approximation algorithms or heuristics to extract sets of nodes that are related to the objective function and that "look like" good communities for the application of interest.

In this paper, we explore a range of network community detection methods in order to compare them and to understand their relative performance and the systematic biases in the clusters they identify. We evaluate several common objective functions that are used to formalize the notion of a network community, and we examine several different classes of approximation algorithms that aim to optimize such objective functions. In addition, rather than simply fixing an objective and asking for an approximation to the best cluster of any size, we consider a size-resolved version of the optimization problem. Considering community quality as a function of its size provides a much finer lens with which to examine community detection algorithms, since objective functions and approximation algorithms often have non-obvious size-dependent behavior.

Summary

Task Description

Detecting clusters or communities in graph network.

The authors try to compare and study performances of:

  • Objective functions
  • Heuristics / Approximation Algorithms that optimize the objectives.

This is because:

  • Heuristics / Algorithms often find clusters that are systematically biased.
  • Certain methods tend to perform particularly well or particularly poorly on certain kinds of graphs
  • In special cases, one might need to identify specific types of clusters.

Background

  • Conductance (A cluster quality score)

(S) = (#edges outside S) / (#edges inside S) (Small conductance means good clusters)