Gini coefficient

From Cohen Courses
Revision as of 10:42, 3 September 2010 by WikiAdmin (talk | contribs) (1 revision)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

Introduction

This page is intended to provide a bit of a practical background into the Gini coefficient. For more detailed information, historical provenance, and so forth from an economic perspective you would be advised to check the Wikipedia article.

What is the Gini coefficient and how does it work?

The Gini coefficient is an economic tool used to calculate the equality of the distribution of a good across a particular whole. You can use it to measure the equality of contribution of an artifact to a particular whole. Technically, Gini is the proportion of area underneath the triangle formed by the Lorenz Curve: G= A/(A+B). (See figure 1, copied from Wikipedia)

Figure 1

An explanation of how it works

Consider representing all the work on a four person project. Every tick on the X axis represents a single user. The Ya xis represents cumulative contributions by different users,ranging from 0% of the effort to 100% of the effort. (See figure 2)

If everyone contributes 25% of the work to the project, the graph of combined contribution will be a 45-degree line up the center of the graph. In this case, G = 0/(0+1) = 0. Everything is equal. (See figure 3)

If, however, A, B, C, and D, all contribute nothing and D does 100% of the work then we have a situation of perfect inequality: G = 1/(1+0) = 1 (See figure 4)

Most likely, we see a mixed situation, as in Figure 5. A contributes 10% of the work, B contributes 20%, C contributes 30%, and D contributes 40%. The calculation for this value of G is obviously significantly more complicated. (G = 1/3, as one will find if one takes the integral)

Two formulas for it

The calculation of the Gini is accurately done with the integral

where L(X) is the formula for the Lorenz curve.

However, Gini can also be calculated as

where all the values through have been placed in increasing order.

When should/shouldn't you use it?

The Gini coefficient measures disparity in contribution. This makes it ideal for situations where you want to compare collective effort across several different activities that are part of one composite whole. Contribution by a group of editors to a particular Wikipedia article is an excellent example, but one should bear in mind that the coefficient is telling you something about equality of contribution to the article, not anything about the broader nature of the participating editors in relation to Wikipedia. (In this case, the editors may contribute to many articles besides the one for which you're calculating the coefficient.)

An implementation in R

Shamelessly stolen from this email thread:

gini <- function(x, unbiased = TRUE, na.rm = FALSE){
    if (!is.numeric(x)){
        warning("'x' is not numeric; returning NA")
        return(NA)
    }
    if (!na.rm && any(na.ind <- is.na(x)))
        stop("'x' contain NAs")
    if (na.rm)
        x <- x[!na.ind]
    n <- length(x)
    mu <- mean(x)
    N <- if (unbiased) n * (n - 1) else n * n
    ox <- x[order(x)]
    dsum <- drop(crossprod(2 * 1:n - n - 1,  ox))
    dsum / (mu * N)
}

########################

gini(c(100,0,0,0)) = 1
gini(c(1,1)) = 0

Research examples

  • Kittur & Kraut have made good use of the Gini coefficient to look at Wikipedia editor collaboration; an ideal discussion of this occurs in their 2008 CHI Paper.
  • Pissard & Prieur used the Gini coefficient as a general measure of heterogenity in collections of photos in Flickr, in order to look at the relative importance of collections for thematic whole or for social function in their 2007 Algotel case study
  • Von Krogh, Spaeth, and Lakhani used Gini coefficients to look at concentration of email sending among a small numbe of users affiliated with FreeNet project in their 2003 Research Policy case study.

Links