Leskovec et al., WWW 2010

Citation

Jure Leskovec, Kevin J. Lang, and Michael Mahoney. 2010. Empirical comparison of algorithms for network community detection. In Proceedings of the 19th international conference on World wide web (WWW '10). ACM, New York, NY, USA, 631-640. DOI=10.1145/1772690.1772755 http://doi.acm.org/10.1145/1772690.1772755

Abstract from the paper

Detecting clusters or communities in large real-world graphs such as large social or information networks is a problem of considerable interest. In practice, one typically chooses an objective function that captures the intuition of a network cluster as set of nodes with better internal connectivity than external connectivity, and then one applies approximation algorithms or heuristics to extract sets of nodes that are related to the objective function and that "look like" good communities for the application of interest.

In this paper, we explore a range of network community detection methods in order to compare them and to understand their relative performance and the systematic biases in the clusters they identify. We evaluate several common objective functions that are used to formalize the notion of a network community, and we examine several different classes of approximation algorithms that aim to optimize such objective functions. In addition, rather than simply fixing an objective and asking for an approximation to the best cluster of any size, we consider a size-resolved version of the optimization problem. Considering community quality as a function of its size provides a much finer lens with which to examine community detection algorithms, since objective functions and approximation algorithms often have non-obvious size-dependent behavior.

Summary

Task Description

Detecting clusters or communities in graph network.

The authors try to compare and study performances of:

Objective functions
Heuristics / Approximation Algorithms that optimize the objectives.

This is because:

Heuristics / Algorithms often find clusters that are systematically biased.
Certain methods tend to perform particularly well or particularly poorly on certain kinds of graphs
In special cases, one might need to identify specific types of clusters.

Background

Conductance (quality score for single cluster)

$\phi$ (S) = (#edges outside S) / (#edges inside S) (Small conductance means good clusters)

Network Community Profile (NCP, size resolved score of clusters)

The score of best cluster of size k: $\Phi (k)=\min _{S\subset V,|S|=k}{\phi (S)}$

Comparison of algorithms

We first compare two graph partitioning algorithms:"Local Spectral Partitioning" and "flow-based Metis+MQI"

Here are the findings:

Metis+MQI generates sets with better conductance
Local Spectral gives tighter and more well-rounded sets ("compact")
At small size scales, Metis+MQI performs better on ratio of external-to-internal conductance, but Local Spectral performs better at larger clusters.

There are some other clustering methods and their properties:

Leighton-Rao algorithm (based on multi-commodity flow) (works on mesh-like graphs)
Graclus (prefer larger clusters, compact, conductance)
Newman's modularity optimizing program (Dendrogram) (compact)

For detailed performance, please see the plots in the paper. (Figure 4)

Comparison of objective functions

（Notation: For set S, n means nodes, m means edges ,c means edges pointing ourside S)

Dataset: 40 networks. DBLP, Enron, Arxiv Astro physics papers, Epinions

Multi-criterion

(1) Conductance: c/(2m+c)
(2) Expansion: c/n
(3) Density: 1 - (2m)/{n(n-1)}
(4) Cut Ratio: c/{n(N-n)}
(5) Normalized Cut: c/(2m+c) + c/{2(M-m)} + c
(6) Max ODF: max frac. of edges of a node pointing outside S
(7) Average ODF: avg. frac. of edges of a node pointing outside S
(8) Flake ODF: frac. of nodes with more than 1/2 edges inside

Here are the findings:

(1), (2), (4), (5) and (7) are similar
(6) perfers smaller clusters, and (8) prefers larger clusters
(3) performs bad, (4) has high variance

Single-criterion

(9) Modularity : (m-E(m))/(4m)
(10) Modularity Ratio: m/E(m)
(11) Volume: 2m+c
(12) Edges cut: c

Here are the findings:

all measures are monotonic
(9) prefers large clusters and ignore small ones

Compute lower bound on conductance

Spectral embedding
SDP-based methods (for volume-balanced partitions)

Algorithms performs good: clusters close to theoretical lower bounds

Related Papers

J. Leskovec, K. Lang, A. Dasgupta, and M. Mahoney. Statistical properties of community structure in large social and information networks. In WWW ’08: Proceedings of the 17th International Conference on World Wide Web, pages 695–704, 2008.
R. Andersen, F. Chung, and K. Lang. Local graph partitioning using PageRank vectors. In FOCS ’06: Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science, pages 475–486, 2006.
S. Arora, S. Rao, and U. Vazirani. Expander ﬂows, geometric embeddings and graph partitioning. In STOC ’04: Proceedings of the 36th annual ACM Symposium on Theory of Computing, pages 222–231, 2004.

Study Plan

This paper is a comparison of some existing methods and models. So, the study plan should be just follow the references and learn the definition and application of methods and models appeared.'

Cluster evaluation

conductance [[1]]
Paper [[2]]

NCP

Paper [[3]]

Local Spectral Partitioning algorithm

Spectral Graph Theory [[4]]
Paper [[5]]

Metis

Paper [[6]]

MQI

Maximum Flow [[7]]
Paper [[8]]

Modularity

Modularity [[9]]

Leskovec et al., WWW 2010

Contents

Citation

Abstract from the paper

Summary

Task Description

Background

Comparison of algorithms

Comparison of objective functions

Multi-criterion

Single-criterion

Compute lower bound on conductance

Related Papers

Study Plan

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools