Statistical properties of community structure in large social and information networks. In WWW ’08

From Cohen Courses
Jump to navigationJump to search

Statistical properties of community structure in large social and information networks. In WWW ’08

Citation

Leskovec, Jure, et al. "Statistical properties of community structure in large social and information networks." Proceeding of the 17th international conference on World Wide Web. ACM, 2008.

Online version

PDF

Summary

This Paper discusses some statistical properties and observed facts of community structure when using network community profile (NCP) plot, which measures the quality of "best" community (they use conductance to measure the quality) as a function of community size. The main empirical findings are as following:

  • There exists a specific size, which empirically is roughly 100 nodes, for the global "best" community. And before this size, there exist well-separated communities (called Whiskers by the author) and they often can be combined well into meaningful larger communities. However, after this size, the best possible communities become more and more "blended into" the remainder of the network and less and less community-like.
  • This specific "best" size for meaningful communities seems irrelevant with network size.

And their observed properties are not reproduced, at even a qualitative level, by any of the commonly-used network generation models they have examined, including preferential attachment, copying, and hierarchical network models.

Network Community Profile plot

Network Community Profile (NCP) plot is the key idea of this paper. It tries to measure the quality of best community as a function of community size. If we use Conductance to measure the quality of communities, NCP is defined as . As smaller conductance means better community, NCP plot over real world communities often have a 'V' shape which shows the existence of a specific "best" community size and it is roughly 100 empirically.

Conductance

The Conductance of a cut in a graph is defined as:

where the are the entries of the adjacency matrix for G, so that

is the total number (or weight) of the edges incident with S.

Generative Models for network community structure

The observation from NCP can be tested on network generative models by checking if the model can reproduce the phenomena from NCP. The author found that many commonly-used network models cannot reproduce the phenomena. Either they cannot generate small communities which can be combined well or they generate a network in which larger communities can have better qualities.

And they suggest a so-called Forest Fire Model can captures some of the phenomena and they give some empirical result of generating networks.

Coherence with ground truth communities

The paper modified NCP by considering only ground truth communities from the datasets. And they found that the V shape still hold in ground truth communities.

Datasets

Study plan

Related work