Newman, PNAS, 2001.

From Cohen Courses
Jump to navigationJump to search

Citation

M.E.J.Newman. 2001. The Structure of Scientific Collaboration Networks. Proceedings of the National Academy of Sciences. 404-409.

Online Version

http://www.pnas.org/content/98/2/404.full.pdf+html

Databases

MEDLINE (biomedical research)[1]

Los Alamos e-Print Archive (physics)[2]

NCSTRL (computer science)[3]

Summary

This is a paper investigating the structure of scientific collaboration. The author ulitized data from a number of databases in different fields: Biomedical, Physics and Computer Science. Properties of these networks are:

  • In all cases, scientific communities seem to constitute a ‘‘small world,’’[4] in which the average distance between scientists via a line of intermediate collaborators varies logarithmically with the size of the relevant community.
  • Those networks are highly clustered, meaning that two scientists are much more likely to have collaborated if they have a third common collaborator than are two scientists chosen at random from the community.
  • Distributions of both the number of collaborators of scientists and the numbers of papers are well fit by power-law forms with an exponential cutoff. This cutoff may be caused by the finite time window (1995-1999) used in the study.
  • There are a number of significant statistical differences between different scientific communities. Some of these are obvious.

Background

Social networks have been the subject of both empirical and theoretical study in the social sciences for at least 50 years. Although many of these studies directly probe the structure of relevant social network, they suffer from two substantial shortcomings that limit their usefulness. First, the studies are labor intensive, and the size of the network that can be mapped is therefore limited—typically to a few tens or hundreds of people. Second, these studies are highly sensitive to subjective bias on the part of interviewees. In this paper, the author presents a study of a genuine network of human acquaintances that is large—containing over a million people—and for which a precise definition of acquaintance is possible. That network is the network of scientific collaboration, as documented in the papers scientists write.

Brief Description of Experiment Method

  • Number of Authors: The author estimates the true number of authors by carrying out analysis twice. The first time, all initials of each author are used. This will solve the problem that two authors may have the same name. The second analysis is carried out using only the first initial of each author to figure out the problem that authors may identify themselves in different ways on different papers.Thus these two analyses give upper and lower bounds on the number of authors and also give an indication of the expected precision of many of our other measurements.
  • Mean Papers per Author and Authors per Paper: The average authors per paper of SPIRES high-energy physics database is much higher than other databases. The reason is that the SPIRES database contains data on experimental as well as theoretical work.
  • Number of Collaborations:
  • Average Degrees of Separation:
  • Clustering: Through the fraction of ‘‘transitive triples’’ in a network[5]also called the clustering coefficient C, we can obtain the existence of clustering in network data.

Related Works

The model to analyze number of collaborators in this paper is highly influenced by Barabasi's Emergence of scaling in random networks. It propose a power-law result that may apply to most networks.

A interesting further study of one of the databases (SPIRES) is Physicists thrive with paperless publishing.