Girvan et al PNAS 2002

From Cohen Courses
Revision as of 03:34, 27 September 2012 by Dzheng (talk | contribs) (→‎Summary)
Jump to navigationJump to search

Citation

Girvan, M. & Newman, M. E. J. Community structure in social and biological networks. Proc. Natl. Acad. Sci. USA 99, 7821-7826 (2002).

Online Version

Link to PNAS

Summary

This paper shows a method for detecting community structure in a network which addresses the topic of Community Detection. The method is designed with the setting that network nodes are joined together in tightly-knit groups between which there are only looser connections.

The key idea is to construct the communities by removing edges from the original graph, contrasting to the traditional way which constructs the communities by add edges to the vertex set. The edges to be removed is chosen based on the edge betweenness - the number of shortest paths between pairs of vertices that run along the edge.

If a network contains communities or groups that are only loosely connected by a few inter-group edges, then all shortest paths between different communities must go along one of these few edges. Thus, the edges connecting communities will have high edge betweenness. By removing these edges, we separate groups from one another and so reveal the underlying community structure of the graph.

Brief description of the method

  1. Calculate the betweenness for all edges in the network.
  2. Remove the edge with the highest betweenness.
  3. Recalculate betweennesses for all edges affected by the removal.
  4. Repeat from step 2 until no edges remain.

Step one uses the fast algorithm to calculate the betweenness from Newman's paper. The result is not well if we omit the third step because if two communities are connected by more than two edges, we can only guarantee one with high betweenness. By recalculating every time, we can ensure that at least one of the remaining edges between two communities will have a high value.

The time complexity of this algorithm is where is the number of edges and is the number of nodes. This is because the each betweenness calculation time complexity is .

Background

Some common properties of many networks

  • Small-word property - average distance between vertices in a network is short
  • power-law degree distributions - many vertices in a network with low degree and a small number with high degree
  • network transitivity - two vertices having a same neighbor would have higher probability of being neighbors of each other.


A traditional method of constructing the communities

  1. Calculate the weight for each pair of vertices.
  2. Beginning from an vertex only set, add edges between pairs one by one in the desc order of the weights.
  3. The resulting graph shows a nested set of increasingly large components, which are taken to be the communities.