Difference between revisions of "M. E. J. Newman PNAS 2006"

From Cohen Courses
Jump to navigationJump to search
Line 53: Line 53:
 
Now, the modularity <math>Q</math> is given by the sum of <math>A_{ij} -  \frac{k_{i}k_{j}}{2m}</math>over all pairs of vertices <math>i, j</math> that fall in the same group.
 
Now, the modularity <math>Q</math> is given by the sum of <math>A_{ij} -  \frac{k_{i}k_{j}}{2m}</math>over all pairs of vertices <math>i, j</math> that fall in the same group.
 
* We can interpret <math>Q</math> as follows, too: We are given some groups now. For each group, we can calculate the difference between the actual number of edges and the expected number of edges over all pairs in the group. <math>Q</math> is the sum of these values.
 
* We can interpret <math>Q</math> as follows, too: We are given some groups now. For each group, we can calculate the difference between the actual number of edges and the expected number of edges over all pairs in the group. <math>Q</math> is the sum of these values.
 +
 +
=== Existing work on community detection based on modularity maximization ===
 +
* [http://deim.urv.cat/~aarenas/publicacions/pdf/jstat05.pdf Danon, L., Duch, J., Diaz-Guilera, A. & Arenas, A. Comparing community structure identification. (2005) J. Stat. Mech.,P09008. ] and [http://69.164.193.67/site_media/publication_pdfs/Guimera-2005-Nature-433-895.pdf Guimer`a, R. & Amaral, L. A. N. Functional cartography of complex metabolic networks. (2005) Nature 433, 895–900. ] used [http://en.wikipedia.org/wiki/Simulated_annealing simulated annealing]. But these methods are not expected to scale to large networks due to the high computation cost.
 +
* There are other heuristic methods such as greedy algorithms ([ ]) and [].
  
 
== Method ==
 
== Method ==

Revision as of 12:16, 1 October 2012

Citation

@article{Newman:2006:Proc-Natl-Acad-Sci-U-S-A:16723398,

 author = {Newman, M E},
 journal = {Proc Natl Acad Sci U S A},
 pages = {8577-8582},
 title = {Modularity and community structure in networks},
 volume = 103,
 year = 2006

Online version

Modularity and community structure in networks


Summary

One effective approach for community detection in networks is the optimization of the quality function known as modularity.

This paper shows that this optimization problem can be rewritten in terms of eigenvalues and eigenvectors of a matrix called modularity matrix, which leads to a spectral algorithm for community detection. Using several real-world data, the author demonstrate the proposed algorithm outperforms existing methods in terms of both quality of results and computation speed.

Background

Modularity

Problem definition

Suppose that we are given the structure of some network and that we want to determine whether there exists any natural division of its vertices into nonoverlapping groups or communities, where these communities may be of any size. Given some divisons on the network, modularity is a quality function that measures how "good" the division is. The question here is, of course, what is the "good" divisions. I will illustrate it in the following.

Intuition

  • For the sake of ease, let us focus initially on the problem of whether any good division of the network exists into just two communities.
  • The most obvious way is to minimize the number of edges running between two groups, which is most often adopted in the graph-partitioning literature.
  • But it allows trivial divisions such as (1) all vertexes in one group and none in the other group, or (2) only 1 vertex in one group and the rest in the other group.
  • So, we need a different way to define a "good" devision of a network.
  • Modularity is introduced in this context. The basic idea here is, a good division of a network into communities is one in which there are fewer edges than expected between communities.
    • "If the number of edges between two groups is only what one would expect on the basis of random chance, then few thoughtful observers would claim this constitutes evidence of meaningful community structure. On the other hand, if the number of edges between groups is significantly less than we expect by chance, or equivalent if the number within groups is significantly more, then it is reasonable to conclude that something interesting is going on." - From the paper

Definition

Given some divisions on networks, modularity is, up to a multiplicative constant, the number of edges falling within groups minus the expected number in an equivalent network with edges placed at random.

  • Preliminary
    • For a particular division of the network into two groups let , if vertex belongs to group 1 and , if vertex belongs to group 2.
    • Let the number of edges between vertices and be .
    • Then, the expected number of edges between vertices and when edges are placed at random is , where and are the degrees of the vertices and is the total number of edges in the network.

Now, the modularity is given by the sum of over all pairs of vertices that fall in the same group.

  • We can interpret as follows, too: We are given some groups now. For each group, we can calculate the difference between the actual number of edges and the expected number of edges over all pairs in the group. is the sum of these values.

Existing work on community detection based on modularity maximization

Method

Dataset

Experiment and Result

The author used modularity value as a performance measure of a community detection method. He compared the proposed method against the following three existing methods.

Strengths and weaknesses

Strength

The problems the authors pointed out regarding existing social CF are genuine to social media, but have not yet been fully considered. The authors propose a method that can solve the problem in a unified way based on MF. In addition, they actually created a Facebook application to collect user behavior data in Facebook. By doing so, they got rich features and conducted detailed analyses on users behaviors, too.

weakness

They manually set the weight for each objective function, and this might be time-consuming in practical situations. In addition, we are not clear which part of extensions actually contributed the increase of accuracy, because there are no systematic analyses. Thus, we cannot get much insight about users behaviors. We also cannot get much insight to improve the proposed method.

Possible impact

If they were able to publish data, it would have much more impact. (Actually, they cannot publish their data because of the requirement from the funding project.) Also, if they conducted analyses about how much each part of the extension contributed to the performance. If they did so, we could have insight about users' behaviors, or ways to improve existing social CF.


Recommendation for whether or not to assign the paper as required/optional reading in later classes.

No. There are not much insight about phenomena in social media.


Related Papers

  • There are many papers on how to combine social network and users' other actions.
    • S. H. Yang et al. WWW 2011 : S. H. Yang, B. Long, A. Smola, N. Sadagopan, Z. Zheng, and H. Zha. Like like alike: Joint friendship and interest propagation in social networks.In WWW-11, 2011.


Study Plan

To understand some matrix calculation, I read some of the paper; K. B. Petersen and M. S. Pedersen. The matrix cookbook, 2008.