Guralnik 99
Contents
Citation
@inproceedings{:conf/kdd/GuralnikS99,
author = {Valery Guralnik and Jaideep Srivastava}, title = {Event Detection from Time Series Data}, booktitle = {KDD}, year = {1999}, pages = {33-42}, ee = {http://doi.acm.org/10.1145/312129.312190}, bibsource = {http://dblp.uni-trier.de}
}
Abstract from the paper
Online version
Summary
Data Collection
Automatic Credibility Analysis
Four types of features depending on their scope: message-based features, user-based features, topic-based features, and propagation- based features.
- Message-based features consider characteristics of messages,
these features can be Twitter-independent or Twitterdependent. Twitter-independent features include: the length of a message, whether or not the text contains exclamation or question marks and the number of positive/negative sentiment words in a message. Twitter-dependent features include features such as: if the tweet contains a hashtag, and if the message is a re-tweet.
- User-based features consider characteristics of the users
which post messages, such as: registration age, number of followers, number of followees (“friends” in Twitter), and the number of tweets the user has authored in the past.
- Topic-based features are aggregates computed from the
previous two feature sets; for example, the fraction of tweets that contain URLs, the fraction of tweets with hashtags and the fraction of sentiment positive and negative in a set.
- Propagation-based features consider characteristics related
to the propagation tree that can be built from the retweets of a message. These includes features such as the depth of the re-tweet tree, or the number of initial tweets of a topic.
Automatic Assessing Credibility
Standard machine learning techniques, the best they report is using J48 decision tree.
Results:
Results for the credibility classification.
Class TP_Rate FP_Rate Prec. Recall F1
A (“true”) 0.825 0.108 0.874 0.825 0.849
B (“false”) 0.892 0.175 0.849 0.892 0.87
W. Avg. 0.860 0.143 0.861 0.860 0.86
Feature Level Analysis
Top feature that contribute more on deciding credibility:
- Tweets having an URL is the root of the tree.
- Sentiment-based feature like fraction of negative sentiment
- Low credibility news are mostly propagated by users who have not written many message in the past
Interesting Aspect
I like the coding scheme of this paper. It is reasonable and comprehensive. Some of the conclusion that drew from the paper is interesting to look at. For example
- Among several other features, newsworthy topics tend to include URLs and to have deep propagation trees
- Among several other features, credible news are propagated through authors that have previously written a large number of messages, originate
at a single or a few users in the network, and have many re-posts.
Related Papers
- T. Sakaki, M. Okazaki, and Y. Matsuo. Earthquake shakes Twitter users: real-time event detection by social sensors.
In Proceedings of the 19th international conference on World wide web, WWW ’10, pages 851–860, New York, NY, USA, April 2010. ACM
- J. Sankaranarayanan, H. Samet, B. E. Teitler, M. D.Lieberman, and J. Sperling. TwitterStand: news in tweets. In GIS ’09: Proceedings of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, pages 42–51, New York, NY, USA, November 2009. ACM Press.