Difference between revisions of "Maximization of the benefit function known as "modularity""
(Created page with '[http://www.pnas.org/content/103/23/8577.full.pdf+html Modularity and community structure in networks] [http://en.wikipedia.org/wiki/Modularity_(networks) Modularity knowledge o…') |
|||
Line 1: | Line 1: | ||
+ | Briefly the method works as follows. | ||
+ | Given a network and a particular division of the vertices of that network | ||
+ | into nonoverlapping groups or communities, the modularity | ||
+ | is defined as the number of edges that lie within | ||
+ | those groups minus the expected number of such edges | ||
+ | if edges are placed at random between the vertices (but | ||
+ | respecting vertex degree). In essence, the modularity | ||
+ | measures whether a larger than expected number of | ||
+ | edges fall within the groups defined. In principle, the task | ||
+ | of finding the best division of the network into groups is | ||
+ | then one of maximizing the modularity over all possible | ||
+ | divisions. In practice, this maximization problem is | ||
+ | known to be NP-complete, so approximate solution | ||
+ | methods must be used for all but the smallest networks. | ||
+ | |||
+ | |||
+ | Newman’s method works by rewriting the modularity in | ||
+ | the language of linear algebra as a quadratic form involving | ||
+ | an index vector and a characteristic matrix dubbed the “modularity matrix.” It can then be shown that the | ||
+ | signs of the elements of the leading eigenvector of this | ||
+ | modularity matrix give an approximation to the division | ||
+ | of the network into two parts that maximizes the modularity. | ||
+ | This approximate maximum can optionally be further | ||
+ | refined by, for instance, applying a greedy algorithm | ||
+ | that moves vertices between groups as described in . | ||
+ | By repeatedly dividing the network in two in this way, a | ||
+ | network can be divided into any number of communities, | ||
+ | although typically one stops dividing when no divisions | ||
+ | exist that will increase the modularity any further. | ||
+ | |||
+ | |||
+ | This repeated subdivision of the network into smaller | ||
+ | and smaller groups is particularly attractive for the purposes | ||
+ | of our present analysis, because it allows us to observe | ||
+ | the major divisions in the network first, followed by | ||
+ | more minor ones, and to stop the process at any point | ||
+ | to compare with our other analyses. A limitation of the | ||
+ | method is that it is designed for use with undirected rather | ||
+ | than directed networks. This however is not a great hindrance. | ||
+ | It seems reasonable to consider edges in a citation | ||
+ | network to be a sign of connection between documents, | ||
+ | and that connection exists regardless of the direction the | ||
+ | edge runs in. So we simply ignore the directions in our | ||
+ | analysis and apply the eigenvector calculation to the undirected | ||
+ | network. This approach has been taken before by | ||
+ | other authors and appears to work well. | ||
+ | |||
+ | |||
+ | == '''Reference''' == | ||
+ | |||
[http://www.pnas.org/content/103/23/8577.full.pdf+html Modularity and community structure in networks] | [http://www.pnas.org/content/103/23/8577.full.pdf+html Modularity and community structure in networks] | ||
[http://en.wikipedia.org/wiki/Modularity_(networks) Modularity knowledge on Wikipedia] | [http://en.wikipedia.org/wiki/Modularity_(networks) Modularity knowledge on Wikipedia] |
Latest revision as of 02:04, 7 February 2011
Briefly the method works as follows. Given a network and a particular division of the vertices of that network into nonoverlapping groups or communities, the modularity is defined as the number of edges that lie within those groups minus the expected number of such edges if edges are placed at random between the vertices (but respecting vertex degree). In essence, the modularity measures whether a larger than expected number of edges fall within the groups defined. In principle, the task of finding the best division of the network into groups is then one of maximizing the modularity over all possible divisions. In practice, this maximization problem is known to be NP-complete, so approximate solution methods must be used for all but the smallest networks.
Newman’s method works by rewriting the modularity in
the language of linear algebra as a quadratic form involving
an index vector and a characteristic matrix dubbed the “modularity matrix.” It can then be shown that the
signs of the elements of the leading eigenvector of this
modularity matrix give an approximation to the division
of the network into two parts that maximizes the modularity.
This approximate maximum can optionally be further
refined by, for instance, applying a greedy algorithm
that moves vertices between groups as described in .
By repeatedly dividing the network in two in this way, a
network can be divided into any number of communities,
although typically one stops dividing when no divisions
exist that will increase the modularity any further.
This repeated subdivision of the network into smaller
and smaller groups is particularly attractive for the purposes
of our present analysis, because it allows us to observe
the major divisions in the network first, followed by
more minor ones, and to stop the process at any point
to compare with our other analyses. A limitation of the
method is that it is designed for use with undirected rather
than directed networks. This however is not a great hindrance.
It seems reasonable to consider edges in a citation
network to be a sign of connection between documents,
and that connection exists regardless of the direction the
edge runs in. So we simply ignore the directions in our
analysis and apply the eigenvector calculation to the undirected
network. This approach has been taken before by
other authors and appears to work well.