Difference between revisions of "Modeling Spread of Disease from Social Interaction"

Revision as of 23:54, 5 November 2012

Citation

Adam Sadilek, Henry Kautz, Vincent Silenzio "Modeling Spread of Disease from Social Interaction" Sixth AAAI International Conference on Weblogs and Social Media (ICWSM)

Online Version

Online Pdf

Summary

A nice Paper which tries to model the spread of communicable diseases via analysis of social media.

An analogy (from the paper!) : Given that ﬁve of your friends have ﬂu-like symptoms, and that you have recently met eight people, possibly strangers, who complained about having runny noses and headaches, what is the probability that you will soon become ill as well?

Traditionally public health is monitored via surveys and statistics obtained from health care centres. This process is expensive and slow (also biased as many of us dont even bother to take medication for flu). This work takes is more fine grained as it considers the fine grained interactions between individuals via their tweets. One of the main challenges of this work was to identify correctly the very small number of tweets which are related to sickness. They develop a SVM classifier for this task which performs really well (with a 0.98 precision and 0.97 recall)

Data

The work was done based on analyzing tweets which were (collected for a month) from the NYC metropolitan area. The specifics of the data are shown in the following figure :

Methodologies and models

Detecting illness related tweets

The major challenge of this work was to detect the tweets of a person which were related to illness since for every health related tweet there were more than 1000 unrelated ones. Given this class imbalance this work formulates a semi-supervised cascade based approach to learn a robust Support Vector Machines (SVM).

To achieve this (extract specific tweets), they first try to obtain high quality training data to train their final classifier. The following figure shows their methodology

To explain it briefly, they first train two different binary SVM classifier . The classifier $C_{S}$ is penalized severely for false positives (normal tweets which are labelled as sick) and $C_{O}$ is penalized severely for false negatives. Then the classifiers are trained by a corpus of hand labeled 5128 tweets. After this, they trained the classifiers with 1.6 million (health related though with noise) tweets which were obtained from the work by Paul and Dredze. $C_{O}$ was further trained with a training set of 200 million tweets. Thresholding was applied to reduce the noise in the cascade. A ﬁnal corpus with over 700 thousand “sick” messages and 3 million “other” tweets were obtained which were used as a training set for the final classifier. The features for the classifier are unigram, bigram and the trigram models.

@@ Line 28: / Line 28: @@
 [[File:55.jpg]]
-To explain it briefly, they first train two different binary SVM classifier . The classifier <math> C_{S} </math> is penalized severely for false positives (normal tweets which are labelled as sick) and <math> C_{O} </math> is penalized severely for false negatives.
+To explain it briefly, they first train two different binary SVM classifier . The classifier <math> C_{S} </math> is penalized severely for false positives (normal tweets which are labelled as sick) and <math> C_{O} </math> is penalized severely for false negatives. Then the classifiers are trained by a corpus of hand labeled 5128 tweets. After this, they trained the classifiers with 1.6 million (health related though with noise) tweets which were obtained from the work by [http://www.cs.jhu.edu/~mdredze/publications/2011.tech.twitter_health.pdf Paul and Dredze]. <math> C_{O} </math> was further trained with a training set of 200 million tweets.  Thresholding was applied to reduce the noise in the cascade. A ﬁnal corpus with over 700 thousand “sick” messages and 3 million “other” tweets were obtained which were used as a training set for the final classifier. The features for the classifier are unigram, bigram and the trigram models.
+==== Modeling the spread of disease ====

Difference between revisions of "Modeling Spread of Disease from Social Interaction"

Revision as of 23:54, 5 November 2012

Contents

Citation

Online Version

Summary

Data

Methodologies and models

Detecting illness related tweets

Modeling the spread of disease

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools