Castillo 2011
Castillo http://www.ra.ethz.ch/cdstore/www2011/proceedings/p675.pdf
Contents
Citation
@inproceedings{conf/www/CastilloMP11,
author = {Carlos Castillo and Marcelo Mendoza and Barbara Poblete}, title = {Information credibility on twitter}, booktitle = {WWW}, year = {2011}, pages = {675-684}, ee = {http://doi.acm.org/10.1145/1963405.1963500},
}
Abstract from the paper
Online version
Summary
Data Collection
Automatic Credibility Analysis
Four types of features depending on their scope: message-based features, user-based features, topic-based features, and propagation- based features.
- Message-based features consider characteristics of messages,
these features can be Twitter-independent or Twitterdependent. Twitter-independent features include: the length of a message, whether or not the text contains exclamation or question marks and the number of positive/negative sentiment words in a message. Twitter-dependent features include features such as: if the tweet contains a hashtag, and if the message is a re-tweet.
- User-based features consider characteristics of the users
which post messages, such as: registration age, number of followers, number of followees (“friends” in Twitter), and the number of tweets the user has authored in the past.
- Topic-based features are aggregates computed from the
previous two feature sets; for example, the fraction of tweets that contain URLs, the fraction of tweets with hashtags and the fraction of sentiment positive and negative in a set.
- Propagation-based features consider characteristics related
to the propagation tree that can be built from the retweets of a message. These includes features such as the depth of the re-tweet tree, or the number of initial tweets of a topic.
Automatic Assessing Credibility
Standard machine learning techniques, the best they report is using J48 decision tree.
Results:
Results for the credibility classification.
Class TP_Rate FP_Rate Prec. Recall F1
A (“true”) 0.825 0.108 0.874 0.825 0.849
B (“false”) 0.892 0.175 0.849 0.892 0.87
W. Avg. 0.860 0.143 0.861 0.860 0.86
Feature Level Analysis
Top feature that contribute more on deciding credibility:
- Tweets having an URL is the root of the tree.
- Sentiment-based feature like fraction of negative sentiment
- Low credibility news are mostly propagated by users who have not written many message in the past
Interesting Aspect
I like the coding scheme of this paper. It is reasonable and comprehensive. Some of the conclusion that drew from the paper is interesting to look at. For example
- Among several other features, newsworthy topics tend to include URLs and to have deep propagation trees
- Among several other features, credible news are propagated through authors that have previously written a large number of messages, originate
at a single or a few users in the network, and have many re-posts.
Related Papers
- T. Sakaki, M. Okazaki, and Y. Matsuo. Earthquake shakes Twitter users: real-time event detection by social sensors.
In Proceedings of the 19th international conference on World wide web, WWW ’10, pages 851–860, New York, NY, USA, April 2010. ACM
- J. Sankaranarayanan, H. Samet, B. E. Teitler, M. D.Lieberman, and J. Sperling. TwitterStand: news in tweets. In GIS ’09: Proceedings of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, pages 42–51, New York, NY, USA, November 2009. ACM Press.
- A Statistical Model for Popular Events Tracking in Social Communities. Lin et al, KDD 2011 This paper address a method to observe and track the popular events or topics that evolve over time in the communities.
- A study on retrospective and online event detection. Yang et al, SIGIR 98 This paper addresses the problems of detecting events in news stories.
- Temporal and information flow based event detection from social text streams. Zhao et al, AAAI 07 This paper addresses the problems of detecting events in news stories.
- Automatic Detection and Classification of Social Events. Agarwal and Rambow, ACL 10 This paper aims at detecting and classifying social events using Tree kernels.
- Detecting controversial events from Twitter. Popescu and Pennacchiotti, CIKM 10 This paper addresses the task of identifying controversial events using Twitter as a starting point.
- Information credibility on twitter. Castillo et al, WWW 11 The authors develop a general approach to change-point detection that generalize across wide range of application.