Ulrik et al Nw Analysis of Collaboration Structure in Wikipedia

This a Paper reviewed for Social Media Analysis 10-802 in Fall 2012.

Citation

Brandes, U.; Kenis, P.; Lerner, J.; and van Raaij, D. 2009. Network analysis of collaboration structure in Wikipedia. In Proc. of WWW 2009.

Online version

Network Analysis of Collaboration Structure in Wikipedia

Summary

This paper describes a graph model to capture properties of individual editors of wikipedia articles and the relationships among them. The model, called the edit network, assigns different attributes for nodes and edges using edit history of an article.

Using the model described, the paper tries to answer an interesting question of whether there are poles of opinions amongst the authors of a particular article or a topic (set of articles). They have experimented with wikipedia's List of Controversial Articles and Featured Articles to compare and prove the effectiveness of the bipolarity measures proposed in the paper.

Evaluation

The paper tries to answer the following main questions:

How to represent author - author relationship in a graph using the edit histories.
How to compute measures to find if there is a bipolarity in the set of author opinions on the topic
How to partition authors into groups if there is bipolarity and how to compute the balance of the edit network.
How to visualize an Edit network.
How to process the wikipedia data to accurately measure the edits.

Main Ideas

Following are the description of the main contributions by the paper.

The Edit Network

The Edit network tries to capture the relationships between editors and the characteristics of each editor based on edit histories.

Basic Structure

The edit network associated with a Wikipedia page p is a tuple G = (V,E,A)

The nodes V of the graph (V,E) correspond to the authors that have done at least one revision on p.
The directed edges E encode the edit interaction among authors. A particular pair of authors (u,v) have an edge if u performed one of the following three actions with respect to v.
1. u deletes text that has been written by v;
2. u undeletes text that has been deleted by v (and written by a potentially different author w);
3. u restores text that has been written by v (and deleted by a potentially different author w).
A is a set of weighted attributes on nodes and edges.

Edge Attributes

Based on interactions between the authors we can classify certain actions as positive whereas some others as negative

Out of the 3 actions described above, 2.1 and 2.2 can be considered as negative interactions. In 2.1, delete(u, v) is measuring the total number of words u deleted which were written by v in previous revisions. Whereas in 2.2, undelete(u, v) is measuring the total number of words u restored after v had deleted it in a previous revision. These actions are measuring how much disagreement is there between the 2 authors.

The last action (c), is considered as a positive action and used to compute the positive weight of the edge or the agreement between the 2 authors.

Node Attributes

Each node (author) does the following 3 activities:

add(u) - Creator of new content (measured by addition of new words)
delete(u) - Someone who deletes content (measured by number of words deleted from other's revision)
restore(u) - Someone who defends content from being removed (measured by 2.3 defined in the Basic Structure)

The sum of all the 3 above measures is called activity(u) of the author.

Another important node characteristic defined is called authorship(u) which is equal to the total number of words of author u, which have survived all the revisions and are present in the latest edition.

Measure of Bipolarity

Once the edit network is defined, with different edge weights for different actions, for the task of finding polarity of opinion, we define a measure called:

revise(u, v) = delete(u, v) + undelete(u, v)

Weight of each edge: w(u,v) = Normalize( revise(u,v) )

Max Cut for Bipolarity

The weights represent the disagreement between two sets of nodes and to find bipolarity we can calculate the Maximum Cut of the graph. But maxcut is not a very good measure of bipolarity because it puts each member in exactly one group and does not take into account the degrees of membership. It is also an expensive and intractable algorithm.

Continuous Projection into Controversy Space for Bipolarity

Authors make use of their previous work to overcome the problem with maxcut described in this paper. The essential idea in the paper is to project the nodes to a k-dimensional space where each k represent a class. The graph is projected to a 2-D space and the max cut solution is computed using the results from the above mentioned paper.

Visualization

The paper describes a way to visualize edit networks. The placement of nodes is based on the projections used for computing bipolarity (from the section above), where nodes from different groups are seperated out in space. For visualizing nodes, the authorship of a node and other attributes are used for determining the size, shape and color of the nodes. For example, using bigger node size for higher authorship scores whereas encoding nodes with different amounts of creation, deletion, restoration (attributes described in edit network) with different shades of color.

Related Dataset

Paper uses Wikipedia's edit history. This dataset has information about edit history. (not sure if the same is used in paper)

Related papers

Study plan

Article: MaxCut
Paper: Summarizing Dynamic Bipolar Conflict Structures