Maximization of the benefit function known as "modularity"
Briefly the method works as follows. Given a network and a particular division of the vertices of that network into nonoverlapping groups or communities, the modularity is defined as the number of edges that lie within those groups minus the expected number of such edges if edges are placed at random between the vertices (but respecting vertex degree). In essence, the modularity measures whether a larger than expected number of edges fall within the groups defined. In principle, the task of finding the best division of the network into groups is then one of maximizing the modularity over all possible divisions. In practice, this maximization problem is known to be NP-complete, so approximate solution methods must be used for all but the smallest networks.
Newman’s method works by rewriting the modularity in
the language of linear algebra as a quadratic form involving
an index vector and a characteristic matrix dubbed the “modularity matrix.” It can then be shown that the
signs of the elements of the leading eigenvector of this
modularity matrix give an approximation to the division
of the network into two parts that maximizes the modularity.
This approximate maximum can optionally be further
refined by, for instance, applying a greedy algorithm
that moves vertices between groups as described in .
By repeatedly dividing the network in two in this way, a
network can be divided into any number of communities,
although typically one stops dividing when no divisions
exist that will increase the modularity any further.
This repeated subdivision of the network into smaller
and smaller groups is particularly attractive for the purposes
of our present analysis, because it allows us to observe
the major divisions in the network first, followed by
more minor ones, and to stop the process at any point
to compare with our other analyses. A limitation of the
method is that it is designed for use with undirected rather
than directed networks. This however is not a great hindrance.
It seems reasonable to consider edges in a citation
network to be a sign of connection between documents,
and that connection exists regardless of the direction the
edge runs in. So we simply ignore the directions in our
analysis and apply the eigenvector calculation to the undirected
network. This approach has been taken before by
other authors and appears to work well.