Difference between revisions of "Guralnik 99"

From Cohen Courses
Jump to navigationJump to search
(Created page with '''Castillo http://delivery.acm.org/10.1145/320000/312190/p33-guralnik.pdf?ip=128.237.122.250&acc=ACTIVE%20SERVICE&CFID=119212228&CFTOKEN=52277574&__acm__=1348531826_377333b00daa1…')
 
m
Line 85: Line 85:
  
 
*J. Sankaranarayanan, H. Samet, B. E. Teitler, M. D.Lieberman, and J. Sperling. TwitterStand: news in tweets. In GIS ’09: Proceedings of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, pages 42–51, New York, NY, USA, November 2009. ACM Press.
 
*J. Sankaranarayanan, H. Samet, B. E. Teitler, M. D.Lieberman, and J. Sperling. TwitterStand: news in tweets. In GIS ’09: Proceedings of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, pages 42–51, New York, NY, USA, November 2009. ACM Press.
 +
 +
* [[RelatedPaper::Lin_et_al_KDD_2011|A Statistical Model for Popular Events Tracking in Social Communities. Lin et al, KDD 2011]] This paper address a method to observe and track the popular events or topics that evolve over time in the communities.
 +
* [[RelatedPaper::Yang et al, SIGIR 98|A study on retrospective and online event detection. Yang et al, SIGIR 98]] This paper addresses the problems of detecting events in news stories.
 +
* [[RelatedPaper::Zhao et al, AAAI 07|Temporal and information flow based event detection from social text streams. Zhao et al, AAAI 07]] This paper addresses the problems of detecting events in news stories.
 +
* [[RelatedPaper::Automatic_Detection_and_Classification_of_Social_Events|Automatic Detection and Classification of Social Events. Agarwal and Rambow, ACL 10]] This paper aims at detecting and classifying social events using Tree kernels.
 +
* [[RelatedPaper::Popescu and Pennacchiotti, CIKM 10|Detecting controversial events from Twitter. Popescu and Pennacchiotti, CIKM 10]] This paper addresses the task of identifying controversial events using Twitter as a starting point.
 +
* [[RelatedPaper::Castillo_2011|Information credibility on twitter. Castillo et al, WWW 11]] The authors develop a general approach to change-point detection that generalize across wide range of application.

Revision as of 22:36, 30 September 2012

Castillo http://delivery.acm.org/10.1145/320000/312190/p33-guralnik.pdf?ip=128.237.122.250&acc=ACTIVE%20SERVICE&CFID=119212228&CFTOKEN=52277574&__acm__=1348531826_377333b00daa1db4fd36cb60f6bb28fb


Citation

@inproceedings{:conf/kdd/GuralnikS99,

 author    = {Valery Guralnik and
              Jaideep Srivastava},
 title     = {Event Detection from Time Series Data},
 booktitle = {KDD},
 year      = {1999},
 pages     = {33-42},
 ee        = {http://doi.acm.org/10.1145/312129.312190},
 bibsource = {http://dblp.uni-trier.de}

}


Abstract from the paper

Online version

pdf link to the paper

Summary

Data Collection

Automatic Credibility Analysis

Four types of features depending on their scope: message-based features, user-based features, topic-based features, and propagation- based features.

  • Message-based features consider characteristics of messages,

these features can be Twitter-independent or Twitterdependent. Twitter-independent features include: the length of a message, whether or not the text contains exclamation or question marks and the number of positive/negative sentiment words in a message. Twitter-dependent features include features such as: if the tweet contains a hashtag, and if the message is a re-tweet.

  • User-based features consider characteristics of the users

which post messages, such as: registration age, number of followers, number of followees (“friends” in Twitter), and the number of tweets the user has authored in the past.

  • Topic-based features are aggregates computed from the

previous two feature sets; for example, the fraction of tweets that contain URLs, the fraction of tweets with hashtags and the fraction of sentiment positive and negative in a set.

  • Propagation-based features consider characteristics related

to the propagation tree that can be built from the retweets of a message. These includes features such as the depth of the re-tweet tree, or the number of initial tweets of a topic.

Automatic Assessing Credibility

Standard machine learning techniques, the best they report is using J48 decision tree.

Results:

Results for the credibility classification.

Class TP_Rate FP_Rate Prec. Recall F1

A (“true”) 0.825 0.108 0.874 0.825 0.849

B (“false”) 0.892 0.175 0.849 0.892 0.87

W. Avg. 0.860 0.143 0.861 0.860 0.86


Feature Level Analysis

Top feature that contribute more on deciding credibility:

  • Tweets having an URL is the root of the tree.
  • Sentiment-based feature like fraction of negative sentiment
  • Low credibility news are mostly propagated by users who have not written many message in the past

Interesting Aspect

I like the coding scheme of this paper. It is reasonable and comprehensive. Some of the conclusion that drew from the paper is interesting to look at. For example

  • Among several other features, newsworthy topics tend to include URLs and to have deep propagation trees
  • Among several other features, credible news are propagated through authors that have previously written a large number of messages, originate

at a single or a few users in the network, and have many re-posts.

Related Papers

  • T. Sakaki, M. Okazaki, and Y. Matsuo. Earthquake shakes Twitter users: real-time event detection by social sensors.

In Proceedings of the 19th international conference on World wide web, WWW ’10, pages 851–860, New York, NY, USA, April 2010. ACM

  • J. Sankaranarayanan, H. Samet, B. E. Teitler, M. D.Lieberman, and J. Sperling. TwitterStand: news in tweets. In GIS ’09: Proceedings of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, pages 42–51, New York, NY, USA, November 2009. ACM Press.