Guralnik 99

Castillo http://delivery.acm.org/10.1145/320000/312190/p33-guralnik.pdf?ip=128.237.122.250&acc=ACTIVE%20SERVICE&CFID=119212228&CFTOKEN=52277574&__acm__=1348531826_377333b00daa1db4fd36cb60f6bb28fb

Citation

@inproceedings{:conf/kdd/GuralnikS99,

 author    = {Valery Guralnik and
              Jaideep Srivastava},
 title     = {Event Detection from Time Series Data},
 booktitle = {KDD},
 year      = {1999},
 pages     = {33-42},
 ee        = {http://doi.acm.org/10.1145/312129.312190},
 bibsource = {http://dblp.uni-trier.de}

}

Abstract from the paper

Online version

pdf link to the paper

Summary

Data Collection

Automatic Credibility Analysis

Four types of features depending on their scope: message-based features, user-based features, topic-based features, and propagation- based features.

Message-based features consider characteristics of messages,

these features can be Twitter-independent or Twitterdependent. Twitter-independent features include: the length of a message, whether or not the text contains exclamation or question marks and the number of positive/negative sentiment words in a message. Twitter-dependent features include features such as: if the tweet contains a hashtag, and if the message is a re-tweet.

User-based features consider characteristics of the users

which post messages, such as: registration age, number of followers, number of followees (“friends” in Twitter), and the number of tweets the user has authored in the past.

Topic-based features are aggregates computed from the

previous two feature sets; for example, the fraction of tweets that contain URLs, the fraction of tweets with hashtags and the fraction of sentiment positive and negative in a set.

Propagation-based features consider characteristics related

to the propagation tree that can be built from the retweets of a message. These includes features such as the depth of the re-tweet tree, or the number of initial tweets of a topic.

Automatic Assessing Credibility

Standard machine learning techniques, the best they report is using J48 decision tree.

Results:

Results for the credibility classification.

Class TP_Rate FP_Rate Prec. Recall F1

A (“true”) 0.825 0.108 0.874 0.825 0.849

B (“false”) 0.892 0.175 0.849 0.892 0.87

W. Avg. 0.860 0.143 0.861 0.860 0.86

Feature Level Analysis

Top feature that contribute more on deciding credibility:

Tweets having an URL is the root of the tree.
Sentiment-based feature like fraction of negative sentiment
Low credibility news are mostly propagated by users who have not written many message in the past

Interesting Aspect

I like the coding scheme of this paper. It is reasonable and comprehensive. Some of the conclusion that drew from the paper is interesting to look at. For example

Among several other features, newsworthy topics tend to include URLs and to have deep propagation trees
Among several other features, credible news are propagated through authors that have previously written a large number of messages, originate

at a single or a few users in the network, and have many re-posts.

Related Papers

T. Sakaki, M. Okazaki, and Y. Matsuo. Earthquake shakes Twitter users: real-time event detection by social sensors.

In Proceedings of the 19th international conference on World wide web, WWW ’10, pages 851–860, New York, NY, USA, April 2010. ACM

J. Sankaranarayanan, H. Samet, B. E. Teitler, M. D.Lieberman, and J. Sperling. TwitterStand: news in tweets. In GIS ’09: Proceedings of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, pages 42–51, New York, NY, USA, November 2009. ACM Press.

Guralnik 99

Contents

Citation

Abstract from the paper

Online version

Summary

Data Collection

Automatic Credibility Analysis

Automatic Assessing Credibility

Feature Level Analysis

Interesting Aspect

Related Papers

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools