Co-clustering documents and words using bipartite spectral graph partitioning

From Cohen Courses
Revision as of 01:50, 28 March 2011 by Nqi (talk | contribs) (→‎Citation)
Jump to navigationJump to search

Citation

Inderjit S. Dhillon. 2001. Co-clustering documents and words using bipartite spectral graph partitioning. KDD.

Online Version

http://www.cs.utexas.edu/users/inderjit/public_papers/kdd_bipartite.pdf

Summary

This is a paper investigating the structure of scientific collaboration. The author ulitized data from a number of databases in different fields: Biomedical, Physics and Computer Science. Properties of these networks are:

  • In all cases, scientific communities seem to constitute a ‘‘small world,’’[1] in which the average distance between scientists via a line of intermediate collaborators varies logarithmically with the size of the relevant community.
  • Those networks are highly clustered, meaning that two scientists are much more likely to have collaborated if they have a third common collaborator than are two scientists chosen at random from the community.
  • Distributions of both the number of collaborators of scientists and the numbers of papers are well fit by power-law forms with an exponential cutoff. This cutoff may be caused by the finite time window (1995-1999) used in the study.
  • There are a number of significant statistical differences between different scientific communities. Some of these are obvious.