Difference between revisions of "Girvan et al PNAS 2002"

From Cohen Courses
Jump to navigationJump to search
Line 25: Line 25:
  
 
The time complexity of this algorithm is <math> O(m^2n)</math> where <math>m</math> is the number of edges and <math>n</math> is the number of nodes. This is because the each betweenness calculation time complexity is <math> O(mn) </math>.
 
The time complexity of this algorithm is <math> O(m^2n)</math> where <math>m</math> is the number of edges and <math>n</math> is the number of nodes. This is because the each betweenness calculation time complexity is <math> O(mn) </math>.
 +
 +
== Experiment result ==
 +
The test result is quite promising. It was verified on three known-community graphs and explored two unknown-community graphs. Both return high accuracy results.
 +
 +
* [[UsesDataset::Girvan et al PNAS 2002 computer-generated graph|computer-generated graph]]. If out of 16 edges of each vertex, six or less edges are inter-community edges, then the accuracy is pretty high: 100% accuracy of classifying the vertex to the community.
 +
* [[UsesDataset::Karate network|Zachary's karate network]] Out of 34 nodes, only one node is classified incorrectly.
 +
* [[UsesDataset::Football_networks|football networks]] Almost all teams are correctly grouped with the other teams in their conference. Few cases in which the algorithm seems to fail actually correspond to nuances in the scheduling of games.
 +
  
 
== Background ==
 
== Background ==

Revision as of 02:57, 27 September 2012

Citation

Girvan, M. & Newman, M. E. J. Community structure in social and biological networks. Proc. Natl. Acad. Sci. USA 99, 7821-7826 (2002).

Online Version

Link to PNAS

Summary

This paper shows a method for detecting community structure in a network which addresses the topic of Community Detection. The method is designed with the setting that network nodes are joined together in tightly-knit groups between which there are only looser connections.

The key idea is to construct the communities by removing edges from the original graph, contrasting to the traditional way which constructs the communities by add edges to the vertex set. The edges to be removed is chosen based on the edge betweenness - the number of shortest paths between pairs of vertices that run along the edge.

If a network contains communities or groups that are only loosely connected by a few inter-group edges, then all shortest paths between different communities must go along one of these few edges. Thus, the edges connecting communities will have high edge betweenness. By removing these edges, we separate groups from one another and so reveal the underlying community structure of the graph.

Brief description of the method

  1. Calculate the betweenness for all edges in the network.
  2. Remove the edge with the highest betweenness.
  3. Recalculate betweennesses for all edges affected by the removal.
  4. Repeat from step 2 until no edges remain.

Step one uses the fast algorithm to calculate the betweenness from Newman's paper. The result is not well if we omit the third step because if two communities are connected by more than two edges, we can only guarantee one with high betweenness. By recalculating every time, we can ensure that at least one of the remaining edges between two communities will have a high value.

The time complexity of this algorithm is where is the number of edges and is the number of nodes. This is because the each betweenness calculation time complexity is .

Experiment result

The test result is quite promising. It was verified on three known-community graphs and explored two unknown-community graphs. Both return high accuracy results.

  • computer-generated graph. If out of 16 edges of each vertex, six or less edges are inter-community edges, then the accuracy is pretty high: 100% accuracy of classifying the vertex to the community.
  • Zachary's karate network Out of 34 nodes, only one node is classified incorrectly.
  • football networks Almost all teams are correctly grouped with the other teams in their conference. Few cases in which the algorithm seems to fail actually correspond to nuances in the scheduling of games.


Background

Some common properties of many networks

  • Small-word property - average distance between vertices in a network is short
  • power-law degree distributions - many vertices in a network with low degree and a small number with high degree
  • network transitivity - two vertices having a same neighbor would have higher probability of being neighbors of each other.


A traditional method of constructing the communities

  1. Calculate the weight for each pair of vertices.
  2. Beginning from an vertex only set, add edges between pairs one by one in the desc order of the weights.
  3. The resulting graph shows a nested set of increasingly large components, which are taken to be the communities.