Modeling Spread of Disease from Social Interaction
Contents
Citation
Adam Sadilek, Henry Kautz, Vincent Silenzio "Modeling Spread of Disease from Social Interaction" Sixth AAAI International Conference on Weblogs and Social Media (ICWSM)
Online Version
Summary
A nice Paper which tries to model the spread of communicable diseases via analysis of social media.
An analogy (from the paper!) : Given that five of your friends have flu-like symptoms, and that you have recently met eight people, possibly strangers, who complained about having runny noses and headaches, what is the probability that you will soon become ill as well?
Traditionally public health is monitored via surveys and statistics obtained from health care centres. This process is expensive and slow (also biased as many of us dont even bother to take medication for flu). This work takes is more fine grained as it considers the fine grained interactions between individuals via their tweets. One of the main challenges of this work was to identify correctly the very small number of tweets which are related to sickness. They develop a SVM classifier for this task which performs really well (with a 0.98 precision and 0.97 recall)
Data
The work was done based on analyzing tweets which were (collected for a month) from the NYC metropolitan area. The specifics of the data are shown in the following figure :
Methodologies and models
The major challenge of this work was to detect the tweets of a person which were related to illness since for every health related tweet there were more than 1000 unrelated ones. Given this class imbalance this work formulates a semi-supervised cascade based approach to learn a robust Support Vector Machines (SVM).
To achieve this (extract specific tweets), they first try to obtain high quality training data to train their final classifier. The following figure shows their methodology
To explain it briefly, they first train two different binary SVM classifier . The classifier is penalized severely for false positives (normal tweets which are labelled as sick) and is penalized severely for false negatives.