Jurgens and Lu ICWSM 2012

From Cohen Courses
Revision as of 21:57, 1 October 2012 by Nnori (talk | contribs) (→‎Summary)
Jump to navigationJump to search

Citation

@inproceedings{DBLP:conf/icwsm/JurgensL12,

 author = {David Jurgens and Tsai-Ching Lu},
 title = {Temporal Motifs Reveal the Dynamics of Editor Interactions in Wikipedia},
 booktitle = {ICWSM},
 year = {2012}

Online version

Temporal Motifs Reveal the Dynamics of Editor Interactions in Wikipedia


Summary

To investigate editor interactions in Wikipedia, this paper proposes to represent Wikipedia's revision history as a temporal, bipartite network with multiple node and edge types for users and revisions. From this representation, they identify author interactions as network motifs and show how the motif types capture editing behaviors. They also demonstrate the usefulness of motifs by two tasks; (1) classification of pages as combative or cooperative page and (2) analysis of the dynamics of editor behavior to explain Wikipedia’s content growth.

Background

Wikipedia

Problem definition

Suppose that we are given the structure of some network and that we want to determine whether there exists any natural division of its vertices into nonoverlapping groups or communities, where these communities may be of any size. Given some divisons on the network, modularity is a quality function that measures how "good" the division is. The question here is, of course, what is the "good" divisions. I will illustrate it in the following.

Intuition

  • For the sake of ease, let us focus initially on the problem of whether any good division of the network exists into just two communities.
  • The most obvious way is to minimize the number of edges running between two groups, which is most often adopted in the graph-partitioning literature.
  • But it allows trivial divisions such as (1) all vertexes in one group and none in the other group, or (2) only 1 vertex in one group and the rest in the other group.
  • So, we need a different way to define a "good" devision of a network.
  • Modularity is introduced in this context. The basic idea here is, a good division of a network into communities is one in which there are fewer edges than expected between communities.
    • "If the number of edges between two groups is only what one would expect on the basis of random chance, then few thoughtful observers would claim this constitutes evidence of meaningful community structure. On the other hand, if the number of edges between groups is significantly less than we expect by chance, or equivalent if the number within groups is significantly more, then it is reasonable to conclude that something interesting is going on." - From the paper

Definition

Given some divisions on networks, modularity is, up to a multiplicative constant, the number of edges falling within groups minus the expected number in an equivalent network with edges placed at random.

  • Preliminary
    • For a particular division of the network into two groups let , if vertex belongs to group 1 and , if vertex belongs to group 2.
    • Let the number of edges between vertices and be .
    • Then, the expected number of edges between vertices and when edges are placed at random is , where and are the degrees of the vertices and is the total number of edges in the network.

Now, the modularity is given by the sum of over all pairs of vertices that fall in the same group. (We can interpret as follows, too: We are given some groups now. For each group, we can calculate the difference between the actual number of edges and the expected number of edges over all pairs in the group. is the sum of these values.)

Since is 1 if and are in the same group and 0 otherwise, we can express modularity as follows:

where the second equality follows from

Note that is merely conventional.

Existing work on community detection based on modularity maximization

Method

Dividing networks into two communities

The author rewrites the equation [1] as follows. where is the column vector whose elements are the , and , which is called modularity matrix.

By writing as a linear combination of the normalized eigenvectors of , it is shown that we can express as follow:

where is the eigenvalue of corresponding to eigenvector .

The author shows that the maximum of is achieved by setting if the corresponding element of the leading eigen vector (whose eigenvalue is largest) is positive and -1 otherwise. Thus, the algorithm is as follows: we compute the leading eigenvector of the modularity matrix and divide the vertices into two groups according to the signs of the elements in this vector.

Dividing networks into more than two communities

The author divides networks into multiple communities by repeating the previous method recursively. That is, he uses the algorithm described above first to divide the network into two parts, then divides those parts, and so on.

More specifically, he considers how much the modularity increases when we divide a group into two parts. He shows this additional contribution of modularity can be expressed in a similar form as the previous section. He also shows that the modularity matrix in the previous section is now rewritten as a generalized modularity matrix. Then he shows that we can apply same spectral algorithm to maximize .

This algorithm tells us clearly at what point we need to halt the subdivision process; If there are no division of a subgraph that will increase the modularity of the network, or equivalently that gives a positive value for , we should stop the process then.

Nice features of this method

  • We do not need to specify the size of communities.
  • It has the ability to refuse to divide the network when no good division exists.
    • If the generalized modularity matrix has no positive eigenvalues, it means there is no division of the network that results in positive modularity, which we can see from the equation [2].

Dataset

Experiment

Review

Recommendation for whether or not to assign the paper as required/optional reading in later classes.

Yes.

  • Modularity-based methods are common in community detection task. This papper might be a good introduction for the concept of modularity.
  • This paper also illustrates how the optimization problem can be rewritten in terms of eigenvalues and eigenvectors of a matrix called modularity matrix, which results into eigenvalue problems. This derivation shows that we can solve a problem by seeing the problem from different view points. This might be a good lesson for us when we face problems.

Related Papers

Study Plan