Ulrik et al Nw Analysis of Collaboration Structure in Wikipedia

From Cohen Courses
Jump to navigationJump to search

This a Paper reviewed for Social Media Analysis 10-802 in Fall 2012.

NOTE - THIS IS A WORK IN PROGRESS!

Citation

Brandes, U.; Kenis, P.; Lerner, J.; and van Raaij, D. 2009. Network analysis of collaboration structure in Wikipedia. In Proc. of WWW 2009.

Online version

Network Analysis of Collaboration Structure in Wikipedia

Summary

This paper describes a graph model to capture properties of individual editors of wikipedia articles and the relationships among them. The model, called the edit network, assigns different attributes for nodes and edges using edit history article.

Using the model described, the paper tries to answer an interesting question of whether there are poles of opinions amongst the authors of a particular article or a topic (set of articles).

They have experimented with wikipedia's List of Controversial Articles and Featured Articles to compare and prove the effectiveness of the bipolarity measures proposed in the paper.

Main Ideas

Following are the description of the main contributions by the paper.

The Edit Network

The Edit network tries to capture the relationships between editors and the characteristics of each editor based on edit histories.

Basic Structure

The edit network associated with a Wikipedia page p is a tuple G = (V,E,A)

  1. The nodes V of the graph (V,E) correspond to the authors that have done at least one revision on p.
  2. The directed edges E encode the edit interaction among authors. A particular pair of authors (u,v) have an edge if u performed

one of the following three actions with respect to v.

    1. u deletes text that has been written by v;
    2. u undeletes text that has been deleted by v (and written by a potentially different author w);
    3. u restores text that has been written by v (and deleted by a potentially different author w).
  1. A is a set of weighted attributes on nodes and edges.

Edge Attributes

Out of the 3 actions described above, (a) and (b) can be considered as negative interactions. In (a), delete(u, v) is measuring the total number of words u deleted which were written by v in previous revisions. Whereas in (b), undelete(u, v) is measuring the total number of words u restored after v had deleted it in a previous revision. These are contrasting actions and hence modeled as negative weights for edges. It is measuring how much disagreement is there between the 2 authors. The last action (c), is considered as a positive action and used to compute the positive weight of the edge or the agreement between the 2 authors.

Node Attributes

Each node (author) does the following 3 activities: (a) add(u) - Creator of new content (measured by addition of new words) (b) delete(u) - Someone who deletes content (measured by number of words deleted from other's revision) (c) restore(u) - someone who defends content from being removed (Measure by number of words related to action (c) defined in the Basic Structure) The sum of all the 3 above is called activity(u) of the author.

Another important node characteristic is called authorship(u) which is equal to the total number of words of author u, which have survived all the revisions and are presently present.

Measure of Bipolarity

Visualization

Related papers

There has been a lot of work on anomaly detection in graphs.

  • The paper by Moonesinghe and Tan ICTAI06 finds the clusters of outlier objects by doing random walk on the weighted graph.
  • The paper by Aggarwal SIGMOD 2001 proposes techniques for projecting high dimensional data on lower dimensions to detect outliers.

Study plan

  • Article:Bipartite graph:[1]
  • Article:Anomaly detection:[2]
  • Paper:Topic sensitive pagerank:[3]
    • Paper:The PageRank Citation Ranking: Bringing Order to the Web:[4]
  • Paper:Multilevel k-way Partitioning Scheme for Irregular Graphs:[5]