Maximization of the benefit function known as "modularity"

From Cohen Courses
Jump to navigationJump to search

Briefly the method works as follows. Given a network and a particular division of the vertices of that network into nonoverlapping groups or communities, the modularity is defined as the number of edges that lie within those groups minus the expected number of such edges if edges are placed at random between the vertices (but respecting vertex degree). In essence, the modularity measures whether a larger than expected number of edges fall within the groups defined. In principle, the task of finding the best division of the network into groups is then one of maximizing the modularity over all possible divisions. In practice, this maximization problem is known to be NP-complete, so approximate solution methods must be used for all but the smallest networks.


Newman’s method works by rewriting the modularity in the language of linear algebra as a quadratic form involving an index vector and a characteristic matrix dubbed the “modularity matrix.” It can then be shown that the signs of the elements of the leading eigenvector of this modularity matrix give an approximation to the division of the network into two parts that maximizes the modularity. This approximate maximum can optionally be further refined by, for instance, applying a greedy algorithm that moves vertices between groups as described in . By repeatedly dividing the network in two in this way, a network can be divided into any number of communities, although typically one stops dividing when no divisions exist that will increase the modularity any further.


This repeated subdivision of the network into smaller and smaller groups is particularly attractive for the purposes of our present analysis, because it allows us to observe the major divisions in the network first, followed by more minor ones, and to stop the process at any point to compare with our other analyses. A limitation of the method is that it is designed for use with undirected rather than directed networks. This however is not a great hindrance. It seems reasonable to consider edges in a citation network to be a sign of connection between documents, and that connection exists regardless of the direction the edge runs in. So we simply ignore the directions in our analysis and apply the eigenvector calculation to the undirected network. This approach has been taken before by other authors and appears to work well.


Reference

Modularity and community structure in networks

Modularity knowledge on Wikipedia