Choudhury et al ICWWW 2010
Citation
M. De Choudhury, W. Mason, J. Hofman, & D. Watts. Inferring relevant social network from , interpersonal communications. In Proceedings of the 19th international conference on World wide web, pp. 301-31, 2010.
Online Version http://research.yahoo.com/files/fp1010-dechoudhury.pdf
Summary
Choudhury et al. examine methods of extracting networks from email exchanges, and using those networks to predict user characteristics. The authors develop networks using email exchange rates, with different thresholds for exchanges being used to form edges within the network. They show that small differences in thresholds can form substantially different networks, and that an optimal prediction network may be found.
Background:
Although the Internet provides researchers with substantial information with respect to platform usage with other individuals (emails, message boards, game usage, etc), these shared usage activities may or may not suggest underlying social ties. This leads to an obvious problem for researchers: how to use this shared activity information in order to infer an actual social network. The second problem is that, even if a social network can be inferred, which network is it? Individuals have multiple networks of different types, and one social network may be relevant to research questions that other networks are not.
Data used:
A registry of exchanged emails within the system of an unidentified large university over a two-year period. The registry contained information about what emails were sent and which individuals sent the emails, but not the identities of the users or the content of the emails.
The Enron email dataset.
Methodology
The authors define the weight of an edge between two nodes as sqrt(wijwji), where wij and wji are emails sent from user i to j and j to i respectively, annualized over two and four year periods. They then define the network as those edges that surpass a certain threshold weight tau. Multiple threshold weights were tested, giving the authors multiple networks to examine.
The resultant networks were then used to predict individuals characteristics: status (faculty, staff, or student for the University dataset, position for the Enron dataset), and gender. Predictions are compared to the predictions made by an unweighted network (where weights of all edges are set to 1).
Discussion
Networks formed using optimal values for tau predict around 30% better than the naïve network.
Optimal values of tau are similar across both data sets; there is no theoretical explanation given.
Related Papers
N. Eagle, A. Pentland, and D. Lazer. Inferring social network structure using mobile phone data. PNAS, 106(36):15274–15278, 2009.
G. Kossinets and D. Watts. Empirical analysis of an evolving social network. Science, 311(5757):88–90, January 2006.
J.-P. Onella, J. Saramaki, J. Hyvonen, M. Argollo de Menezes, K. Kaski, A.-L. Barab´asi, and J. Kert´esz. Analysis of a large-scale weighted network of one-to-one human communication. New Journal of Physics, 9(6):179–204, February 2007.