Difference between revisions of "M. E. J. Newman PNAS 2006"

Revision as of 11:43, 1 October 2012

Citation

@article{Newman:2006:Proc-Natl-Acad-Sci-U-S-A:16723398,

 author = {Newman, M E},
 journal = {Proc Natl Acad Sci U S A},
 pages = {8577-8582},
 title = {Modularity and community structure in networks},
 volume = 103,
 year = 2006

Online version

Modularity and community structure in networks

Summary

Background

Modularity

Some links in this wiki:
- modularity
- Maximization of the benefit function known as "modularity"

Problem definition

Suppose that we are given the structure of some network and that we want to determine whether there exists any natural division of its vertices into nonoverlapping groups or communities, where these communities may be of any size.

Intuition

Definition

Given some divisions on networks, modularity is, up to a multiplicative constant, the number of edges falling within groups minus the expected number in an equivalent network with edges placed at random.

Preliminary
- For a particular division of the network into two groups let $s_{i}=1$ , if vertex $i$ belongs to group 1 and $s_{i}=-1$ , if vertex $i$ belongs to group 2.
- Let the number of edges between vertices $i$ and $j$ be $A_{ij}$ .
- Then, the expected number of edges between vertices $i$ and $j$ when edges are placed at random is ${\frac {k_{i}k_{j}}{2m}}$ , where $k_{i}$ and $k_{j}$ are the degrees of the vertices and $m={\frac {1}{2}}\sum _{i}k_{i}$ is the total number of edges in the network.

Now, the modularity $Q$ is given by the sum of $A_{ij}-{\frac {k_{i}k_{j}}{2m}}$ over all pairs of vertices $i,j$ that fall in the same group.

We can interpret $Q$ as follows, too: We are given some groups now. For each group, we can calculate the difference between actual number of edges and the expected number of edges for all pairs in the group. $Q$ is the sum of these values.

Method

Dataset

Zachary's karate network 34 nodes.
Pablo's jazz musicians network 198 nodes.
Jeong's metabolic network 453 nodes.
Guimer's email network 1,133 nodes.
Guardiola,'s Key signing network 10,680 nodes.
Newman's Physicists network 27,519 nodes.

Experiment and Result

The author used modularity value as a performance measure of a community detection method. He compared the proposed method against the following three existing methods.

Betweenness-based algorithm of Girvan and Newman
- Girvan, M. & Newman, M. E. J. Community structure in social and biological networks. (2002) Proc. Natl. Acad. Sci. USA 99,7821–7826.
Fast algorithm of Clauset et al.
- Clauset, A., Newman, M. E. J. & Moore, C. Finding community structure in very large networks. (2004) Phys. Rev. E 70, 066111.
- It optimizes modularity by using a greedy algorithm.
Extremal optimization algorithm of Duch and Arenas
- Duch, J. & Arenas, A. Community detection in complex networks using extremal optimization. (2005) Phys. Rev. E 72, 027104.

Strengths and weaknesses

Strength

The problems the authors pointed out regarding existing social CF are genuine to social media, but have not yet been fully considered. The authors propose a method that can solve the problem in a unified way based on MF. In addition, they actually created a Facebook application to collect user behavior data in Facebook. By doing so, they got rich features and conducted detailed analyses on users behaviors, too.

weakness

They manually set the weight for each objective function, and this might be time-consuming in practical situations. In addition, we are not clear which part of extensions actually contributed the increase of accuracy, because there are no systematic analyses. Thus, we cannot get much insight about users behaviors. We also cannot get much insight to improve the proposed method.

Possible impact

If they were able to publish data, it would have much more impact. (Actually, they cannot publish their data because of the requirement from the funding project.) Also, if they conducted analyses about how much each part of the extension contributed to the performance. If they did so, we could have insight about users' behaviors, or ways to improve existing social CF.

Recommendation for whether or not to assign the paper as required/optional reading in later classes.

No. There are not much insight about phenomena in social media.

Related Papers

There are many papers on how to combine social network and users' other actions.
- S. H. Yang et al. WWW 2011 : S. H. Yang, B. Long, A. Smola, N. Sadagopan, Z. Zheng, and H. Zha. Like like alike: Joint friendship and interest propagation in social networks.In WWW-11, 2011.

Study Plan

To understand some matrix calculation, I read some of the paper; K. B. Petersen and M. S. Pedersen. The matrix cookbook, 2008.

@@ Line 19: / Line 19: @@
 == Background ==
+=== Modularity ===
+*Some links in this wiki:
+** [http://malt.ml.cmu.edu/mw/index.php/%E2%80%9Cmodularity%E2%80%9D modularity]
+** [http://malt.ml.cmu.edu/mw/index.php/Maximization_of_the_benefit_function_known_as_%22modularity%22 Maximization of the benefit function known as "modularity"]
+==== Problem definition ====
+Suppose that we are given the structure of some network and that we want to determine whether there exists any natural division of its vertices into nonoverlapping groups or communities, where these communities may be of any size.
+==== Intuition ====
+==== Definition ====
+Given some divisions on networks, modularity is, up to a multiplicative constant, the number of edges falling within groups minus the expected number in an equivalent network with edges placed at random.
+* Preliminary
+** For a particular division of the network into two groups let <math>s_i = 1</math>, if vertex <math>i</math> belongs to group 1 and <math>s_i = -1</math>, if vertex <math>i</math> belongs to group 2.
+** Let the number of edges between vertices <math>i</math> and <math>j</math> be <math>A_{ij}</math>.
+** Then, the expected number of edges between vertices <math>i</math> and <math>j</math> when edges are placed at random is <math>\frac{k_i k_j}{2m}</math>, where <math>k_i</math> and <math>k_j</math>are the degrees of the vertices and <math>m=\frac{1}{2}\sum_{i} k_i</math> is the total number of edges in the network.
+Now, the modularity <math>Q</math> is given by the sum of <math>A_{ij} -  \frac{k_{i}k_{j}}{2m}</math>over all pairs of vertices <math>i, j</math> that fall in the same group.
+* We can interpret <math>Q</math> as follows, too: We are given some groups now. For each group, we can calculate the difference between actual number of edges and the expected number of edges for all pairs in the group. <math>Q</math> is the sum of these values.
 == Method ==