Jurgens and Lu ICWSM 2012

1 Citation
2 Online version
3 Summary
4 Proposed network representation
- 4.1 Definition
- 4.2 Network derivation from Wikipedia dataset
5 Method
6 Dataset
7 Experiment
8 Review
- 8.1 Recommendation for whether or not to assign the paper as required/optional reading in later classes.
9 Related Papers
10 Study Plan

Citation

@inproceedings{DBLP:conf/icwsm/JurgensL12,

 author = {David Jurgens and Tsai-Ching Lu},
 title = {Temporal Motifs Reveal the Dynamics of Editor Interactions in Wikipedia},
 booktitle = {ICWSM},
 year = {2012}

Online version

Temporal Motifs Reveal the Dynamics of Editor Interactions in Wikipedia

Summary

Underlying the growth of Wikipedia are the cooperative –and sometimes combative– interactions between editors working on the same content. But most research on Wikipedia editor interactions focus on cooperative behaviors, which calls for a full analysis of all types of editing behaviors, including both cooperative and combative. To investigate editor interactions in Wikipedia in this context, this paper proposes to represent Wikipedia's revision history as a temporal, bipartite network with multiple node and edge types for users and revisions. From this representation, they identify author interactions as network motifs and show how the motif types capture editing behaviors. They demonstrate the usefulness of motifs by two tasks; (1) classification of pages as combative or cooperative page and (2) analysis of the dynamics of editor behavior to explain Wikipedia’s content growth.

Proposed network representation

Definition

They view editor interactions in Wikipedia as a bipartite graph from authors to the pages. They expand this representation to encode three additional features: (1) the type of author who made the change, (2) the time at which the change was made, and (3) the magnitude and effect of the change to the page. To do so, they define the bipartite graph of Wikipedia revisions as follows.

The figure below illustrates a subset of a page’s history as sequence of classified revisions.

Network derivation from Wikipedia dataset

Method

Dividing networks into two communities

The author rewrites the equation [1] as follows. $Q={\frac {1}{4m}}\mathbf {s^{T}} \mathbf {B} \mathbf {s} ,$ where $\mathbf {s}$ is the column vector whose elements are the $s_{i}$ , and $B_{ij}=A_{ij}-{\frac {k_{i}k_{j}}{2m}}$ , which is called modularity matrix.

By writing $\mathbf {s}$ as a linear combination of the normalized eigenvectors $u_{i}$ of $\mathbf {B}$ , it is shown that we can express $Q$ as follow:

$Q={\frac {1}{4m}}\sum _{i}(\mathbf {u_{i}} ^{T}\cdot \mathbf {s} )^{2}\beta _{i},[2]$

where $\beta _{i}$ is the eigenvalue of $\mathbf {B}$ corresponding to eigenvector $\mathbf {u_{i}}$ .

The author shows that the maximum of $Q$ is achieved by setting $s_{i}=+1$ if the corresponding element of the leading eigen vector (whose eigenvalue is largest) is positive and -1 otherwise. Thus, the algorithm is as follows: we compute the leading eigenvector of the modularity matrix and divide the vertices into two groups according to the signs of the elements in this vector.

Dividing networks into more than two communities

The author divides networks into multiple communities by repeating the previous method recursively. That is, he uses the algorithm described above first to divide the network into two parts, then divides those parts, and so on.

More specifically, he considers how much the modularity increases when we divide a group $g$ into two parts. He shows this additional contribution of modularity $\Delta {Q}$ can be expressed in a similar form as the previous section. He also shows that the modularity matrix in the previous section is now rewritten as a generalized modularity matrix. Then he shows that we can apply same spectral algorithm to maximize $\Delta {Q}$ .

This algorithm tells us clearly at what point we need to halt the subdivision process; If there are no division of a subgraph that will increase the modularity of the network, or equivalently that gives a positive value for $\Delta {Q}$ , we should stop the process then.

Nice features of this method

We do not need to specify the size of communities.
It has the ability to refuse to divide the network when no good division exists.
- If the generalized modularity matrix has no positive eigenvalues, it means there is no division of the network that results in positive modularity, which we can see from the equation [2].

Dataset

Zachary's karate network 34 nodes.
Pablo's jazz musicians network 198 nodes.
Jeong's metabolic network 453 nodes.
Guimer's email network 1,133 nodes.
Guardiola,'s Key signing network 10,680 nodes.
Newman's Physicists network 27,519 nodes.

Experiment

Measure
- The author used modularity value as a performance measure of a community detection method.
Competing methods
- Betweenness-based algorithm of Girvan and Newman
  - Girvan, M. & Newman, M. E. J. Community structure in social and biological networks. (2002) PNAS.
- Fast algorithm of Clauset et al.
  - Clauset, A., Newman, M. E. J. & Moore, C. Finding community structure in very large networks. (2004) Phys. Rev. E 70, 066111.
  - It optimizes modularity by using a greedy algorithm.
- Extremal optimization algorithm of Duch and Arenas
  - Duch, J. & Arenas, A. Community detection in complex networks using extremal optimization. (2005) Phys. Rev. E 72, 027104.
Result
- The proposed method outperformed the first two methods (Betweenness-based algorithm of Girvan and Newman, Fast algorithm of Clauset et al. ) for all of the networks.
- The third method (Extremal optimization algorithm of Duch and Arenas) was more competitive. There are no clear diference between them for the smaller networks up to ~ 1000 vertices. But for larger networks, the proposed method outperformed the third method, and the performance gap increased as the size of networks increased, showing that the proposed method is most promising for large networks.

Review

Recommendation for whether or not to assign the paper as required/optional reading in later classes.

Yes.

Modularity-based methods are common in community detection task. This papper might be a good introduction for the concept of modularity.
This paper also illustrates how the optimization problem can be rewritten in terms of eigenvalues and eigenvectors of a matrix called modularity matrix, which results into eigenvalue problems. This derivation shows that we can solve a problem by seeing the problem from different view points. This might be a good lesson for us when we face problems.

Related Papers

Study Plan

Newman, M. E. J. & Girvan, M. Finding and evaluating community structure in networks. (2004) Phys. Rev. E 69, 026113.
- This is a paper published in 2004, by the same author, which investigated the problem of community detection by several approaches. This paper also introduced the term "modularity" to evaluate the community structure. So, this paper might be a good material to understand the motivation of the authors.