Difference between revisions of "Choudhury et al ICWWW 2010"

From Cohen Courses
Jump to navigationJump to search
Line 1: Line 1:
 
== Citation ==
 
== Citation ==
  
M. De Choudhury, W. Mason, J. Hofman, & D. Watts.  Inferring relevant social network from , interpersonal communications.  In Proceedings of the 19th international conference on World wide web, pp. 301-31, 2010.
+
M. De Choudhury, W. Mason, J. Hofman, & D. Watts.  Inferring relevant social network from interpersonal communications.  In Proceedings of the 19th international conference on World wide web, pp. 301-31, 2010.
  
 
== Online Version ==
 
== Online Version ==
Line 25: Line 25:
 
== Methodology ==
 
== Methodology ==
  
The authors define the weight of an edge between two nodes as <math>sqrt(wijwji)</math>,  where wij and wji are emails sent from user i to j and j to i respectively, annualized over two and four year periods.  They then define the network as those edges that surpass a certain threshold weight tau.  Multiple threshold weights were tested, giving the authors multiple networks to examine.   
+
The authors define the weight of an edge between two nodes as <math>sqrt(wij * wji)</math>,  where wij and wji are emails sent from user i to j and j to i respectively, annualized over two and four year periods.  They then define the network as those edges that surpass a certain threshold weight tau.  Multiple threshold weights were tested, giving the authors multiple networks to examine.   
  
 
The resultant networks were then used to predict individuals characteristics: status (faculty, staff, or student for the University dataset, position for the Enron dataset), and gender.  Predictions are compared to the predictions made by an unweighted network (where weights of all edges are set to 1).
 
The resultant networks were then used to predict individuals characteristics: status (faculty, staff, or student for the University dataset, position for the Enron dataset), and gender.  Predictions are compared to the predictions made by an unweighted network (where weights of all edges are set to 1).
Line 33: Line 33:
  
 
Networks formed using optimal values for tau predict around 30% better than the naïve network.  
 
Networks formed using optimal values for tau predict around 30% better than the naïve network.  
 +
 
Optimal values of tau are similar across both data sets; there is no theoretical explanation given.
 
Optimal values of tau are similar across both data sets; there is no theoretical explanation given.
  

Revision as of 01:41, 27 September 2012

Citation

M. De Choudhury, W. Mason, J. Hofman, & D. Watts. Inferring relevant social network from interpersonal communications. In Proceedings of the 19th international conference on World wide web, pp. 301-31, 2010.

Online Version

link to the paper

Summary

Choudhury et al. examine methods of extracting networks from email exchanges, and using those networks to predict user characteristics. The authors develop networks using email exchange rates, with different thresholds for exchanges being used to form edges within the network. They show that small differences in thresholds can form substantially different networks, and that an optimal prediction network may be found.

Background

Although the Internet provides researchers with substantial information with respect to platform usage with other individuals (emails, message boards, game usage, etc), these shared usage activities may or may not suggest underlying social ties. This leads to an obvious problem for researchers: how to use this shared activity information in order to infer an actual social network. The second problem is that, even if a social network can be inferred, which network is it? Individuals have multiple networks of different types, and one social network may be relevant to research questions that other networks are not.


Data Used

A registry of exchanged emails within the system of an unidentified large university over a two-year period. The registry contained information about what emails were sent and which individuals sent the emails, but not the identities of the users or the content of the emails.

The Enron email dataset.


Methodology

The authors define the weight of an edge between two nodes as , where wij and wji are emails sent from user i to j and j to i respectively, annualized over two and four year periods. They then define the network as those edges that surpass a certain threshold weight tau. Multiple threshold weights were tested, giving the authors multiple networks to examine.

The resultant networks were then used to predict individuals characteristics: status (faculty, staff, or student for the University dataset, position for the Enron dataset), and gender. Predictions are compared to the predictions made by an unweighted network (where weights of all edges are set to 1).


Discussion

Networks formed using optimal values for tau predict around 30% better than the naïve network.

Optimal values of tau are similar across both data sets; there is no theoretical explanation given.


Related Papers

N. Eagle, A. Pentland, and D. Lazer. Inferring social network structure using mobile phone data. PNAS, 106(36):15274–15278, 2009.

G. Kossinets and D. Watts. Empirical analysis of an evolving social network. Science, 311(5757):88–90, January 2006.

J. Onella, J. Saramaki, J. Hyvonen, M. Argollo de Menezes, K. Kaski, A. Barab´asi, and J. Kert´esz. Analysis of a large-scale weighted network of one-to-one human communication. New Journal of Physics, 9(6):179–204, February 2007.