Difference between revisions of "Girvan et al PNAS 2002"

From Cohen Courses
Jump to navigationJump to search
Line 20: Line 20:
 
#Repeat from step 2 until no edges remain.
 
#Repeat from step 2 until no edges remain.
  
Step one uses the fast algorithm from [http://www-personal.umich.edu/~mejn/papers/016132.pdf Newman's paper].
+
Step one uses the fast algorithm to calculate the betweenness from [http://www-personal.umich.edu/~mejn/papers/016132.pdf Newman's paper]. The result is not well if we omit the third step because if two communities are connected by more than two edges, we can only guarantee one with high betweenness. By recalculating every time, we can ensure that at least one of the remaining edges between two communities will have a high value.
 +
 
 +
The time complexity of this algorithm is <math> O(m-2n)</math>
  
 
== Background ==
 
== Background ==

Revision as of 02:27, 27 September 2012

Citation

Girvan, M. & Newman, M. E. J. Community structure in social and biological networks. Proc. Natl. Acad. Sci. USA 99, 7821-7826 (2002).

Online Version

Link to PNAS

Summary

This paper shows a method for detecting community structure in a network which addresses the topic of Community Detection. The method is designed with the setting that network nodes are joined together in tightly-nit groups between which there are only looser connections.

The key idea is to construct the communities by removing edges from the original graph, contrasting to the traditional way which constructs the communities by add edges to the vertex set. The edges to be removed is chosen based on the edge betweenness - the number of shortest paths between pairs of vertices that run along the edge.

Brief description of the method

  1. Calculate the betweenness for all edges in the network.
  2. Remove the edge with the highest betweenness.
  3. Recalculate betweennesses for all edges affected by the removal.
  4. Repeat from step 2 until no edges remain.

Step one uses the fast algorithm to calculate the betweenness from Newman's paper. The result is not well if we omit the third step because if two communities are connected by more than two edges, we can only guarantee one with high betweenness. By recalculating every time, we can ensure that at least one of the remaining edges between two communities will have a high value.

The time complexity of this algorithm is

Background

Some common properties of many networks

  • Small-word property - average distance between vertices in a network is short
  • power-law degree distributions - many vertices in a network with low degree and a small number with high degree
  • network transitivity - two vertices having a same neighbor would have higher probability of being neighbors of each other.


A traditional method of constructing the communities

  1. Calculate the weight for each pair of vertices.
  2. Beginning from an vertex only set, add edges between pairs one by one in the desc order of the weights.
  3. The resulting graph shows a nested set of increasingly large components, which are taken to be the communities.