Difference between revisions of "Castillo 2011"
Line 15: | Line 15: | ||
ee = {http://doi.acm.org/10.1145/1963405.1963500}, | ee = {http://doi.acm.org/10.1145/1963405.1963500}, | ||
} | } | ||
+ | |||
+ | |||
+ | |||
+ | ''Castillo | ||
+ | http://delivery.acm.org/10.1145/320000/312190/p33-guralnik.pdf?ip=128.237.122.250&acc=ACTIVE%20SERVICE&CFID=119212228&CFTOKEN=52277574&__acm__=1348531826_377333b00daa1db4fd36cb60f6bb28fb | ||
+ | |||
+ | |||
== Abstract from the paper == | == Abstract from the paper == | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
== Online version == | == Online version == | ||
− | |||
== Summary == | == Summary == | ||
− | === | + | === Data Collection=== |
− | |||
− | === | + | === Automatic Credibility Analysis === |
− | === | + | Four types of features depending on their scope: message-based features, |
− | + | user-based features, topic-based features, and propagation- | |
+ | based features. | ||
+ | *'''Message-based features''' consider characteristics of messages, | ||
+ | these features can be Twitter-independent or Twitterdependent. | ||
+ | Twitter-independent features include: the length | ||
+ | of a message, whether or not the text contains exclamation | ||
+ | or question marks and the number of positive/negative sentiment | ||
+ | words in a message. Twitter-dependent features include | ||
+ | features such as: if the tweet contains a hashtag, and | ||
+ | if the message is a re-tweet. | ||
+ | *'''User-based features''' consider characteristics of the users | ||
+ | which post messages, such as: registration age, number of | ||
+ | followers, number of followees (“friends” in Twitter), and the | ||
+ | number of tweets the user has authored in the past. | ||
+ | *'''Topic-based features''' are aggregates computed from the | ||
+ | previous two feature sets; for example, the fraction of tweets | ||
+ | that contain URLs, the fraction of tweets with hashtags and | ||
+ | the fraction of sentiment positive and negative in a set. | ||
+ | *'''Propagation-based features''' consider characteristics related | ||
+ | to the propagation tree that can be built from the retweets | ||
+ | of a message. These includes features such as the | ||
+ | depth of the re-tweet tree, or the number of initial tweets of | ||
+ | a topic. | ||
+ | === Automatic Assessing Credibility === | ||
+ | Standard machine learning techniques, the best they report is using J48 decision tree. | ||
− | + | Results: | |
− | |||
− | + | Results for the credibility classification. | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | + | Class TP_Rate FP_Rate Prec. Recall F1 | |
− | + | A (“true”) 0.825 0.108 0.874 0.825 0.849 | |
− | + | B (“false”) 0.892 0.175 0.849 0.892 0.87 | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | + | W. Avg. 0.860 0.143 0.861 0.860 0.86 | |
− | |||
− | |||
− | === | + | === Feature Level Analysis === |
− | + | Top feature that contribute more on deciding credibility: | |
+ | *Tweets having an URL is the root of the tree. | ||
+ | *Sentiment-based feature like fraction of negative sentiment | ||
+ | *Low credibility news are mostly propagated by users who have not written many message in the past | ||
− | == Interesting | + | == Interesting Aspect == |
− | + | I like the coding scheme of this paper. It is reasonable and comprehensive. Some of the conclusion that drew from the paper is interesting to look at. For example | |
− | + | * Among several other features, newsworthy topics tend to include URLs and to have deep propagation trees | |
+ | * Among several other features, credible news are propagated through authors that have previously written a large number of messages, originate | ||
+ | at a single or a few users in the network, and have many re-posts. | ||
− | * | + | == Related Papers == |
+ | *T. Sakaki, M. Okazaki, and Y. Matsuo. Earthquake shakes Twitter users: real-time event detection by social sensors. | ||
+ | In Proceedings of the 19th international conference on World wide web, WWW ’10, pages 851–860, New York, NY, USA, April 2010. ACM | ||
− | * | + | *J. Sankaranarayanan, H. Samet, B. E. Teitler, M. D.Lieberman, and J. Sperling. TwitterStand: news in tweets. In GIS ’09: Proceedings of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, pages 42–51, New York, NY, USA, November 2009. ACM Press. |
− | * [[RelatedPaper::Automatic_Detection_and_Classification_of_Social_Events|Automatic Detection and Classification of Social Events. Agarwal and Rambow, ACL 10]] This | + | * [[RelatedPaper::Lin_et_al_KDD_2011|A Statistical Model for Popular Events Tracking in Social Communities. Lin et al, KDD 2011]] This paper address a method to observe and track the popular events or topics that evolve over time in the communities. |
+ | * [[RelatedPaper::Yang et al, SIGIR 98|A study on retrospective and online event detection. Yang et al, SIGIR 98]] This paper addresses the problems of detecting events in news stories. | ||
+ | * [[RelatedPaper::Zhao et al, AAAI 07|Temporal and information flow based event detection from social text streams. Zhao et al, AAAI 07]] This paper addresses the problems of detecting events in news stories. | ||
+ | * [[RelatedPaper::Automatic_Detection_and_Classification_of_Social_Events|Automatic Detection and Classification of Social Events. Agarwal and Rambow, ACL 10]] This paper aims at detecting and classifying social events using Tree kernels. | ||
+ | * [[RelatedPaper::Popescu and Pennacchiotti, CIKM 10|Detecting controversial events from Twitter. Popescu and Pennacchiotti, CIKM 10]] This paper addresses the task of identifying controversial events using Twitter as a starting point. | ||
+ | * [[RelatedPaper::Castillo_2011|Information credibility on twitter. Castillo et al, WWW 11]] The authors develop a general approach to change-point detection that generalize across wide range of application. |
Revision as of 22:05, 1 October 2012
Castillo http://www.ra.ethz.ch/cdstore/www2011/proceedings/p675.pdf
Contents
Citation
@inproceedings{conf/www/CastilloMP11,
author = {Carlos Castillo and Marcelo Mendoza and Barbara Poblete}, title = {Information credibility on twitter}, booktitle = {WWW}, year = {2011}, pages = {675-684}, ee = {http://doi.acm.org/10.1145/1963405.1963500},
}
Abstract from the paper
Online version
Summary
Data Collection
Automatic Credibility Analysis
Four types of features depending on their scope: message-based features, user-based features, topic-based features, and propagation- based features.
- Message-based features consider characteristics of messages,
these features can be Twitter-independent or Twitterdependent. Twitter-independent features include: the length of a message, whether or not the text contains exclamation or question marks and the number of positive/negative sentiment words in a message. Twitter-dependent features include features such as: if the tweet contains a hashtag, and if the message is a re-tweet.
- User-based features consider characteristics of the users
which post messages, such as: registration age, number of followers, number of followees (“friends” in Twitter), and the number of tweets the user has authored in the past.
- Topic-based features are aggregates computed from the
previous two feature sets; for example, the fraction of tweets that contain URLs, the fraction of tweets with hashtags and the fraction of sentiment positive and negative in a set.
- Propagation-based features consider characteristics related
to the propagation tree that can be built from the retweets of a message. These includes features such as the depth of the re-tweet tree, or the number of initial tweets of a topic.
Automatic Assessing Credibility
Standard machine learning techniques, the best they report is using J48 decision tree.
Results:
Results for the credibility classification.
Class TP_Rate FP_Rate Prec. Recall F1
A (“true”) 0.825 0.108 0.874 0.825 0.849
B (“false”) 0.892 0.175 0.849 0.892 0.87
W. Avg. 0.860 0.143 0.861 0.860 0.86
Feature Level Analysis
Top feature that contribute more on deciding credibility:
- Tweets having an URL is the root of the tree.
- Sentiment-based feature like fraction of negative sentiment
- Low credibility news are mostly propagated by users who have not written many message in the past
Interesting Aspect
I like the coding scheme of this paper. It is reasonable and comprehensive. Some of the conclusion that drew from the paper is interesting to look at. For example
- Among several other features, newsworthy topics tend to include URLs and to have deep propagation trees
- Among several other features, credible news are propagated through authors that have previously written a large number of messages, originate
at a single or a few users in the network, and have many re-posts.
Related Papers
- T. Sakaki, M. Okazaki, and Y. Matsuo. Earthquake shakes Twitter users: real-time event detection by social sensors.
In Proceedings of the 19th international conference on World wide web, WWW ’10, pages 851–860, New York, NY, USA, April 2010. ACM
- J. Sankaranarayanan, H. Samet, B. E. Teitler, M. D.Lieberman, and J. Sperling. TwitterStand: news in tweets. In GIS ’09: Proceedings of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, pages 42–51, New York, NY, USA, November 2009. ACM Press.
- A Statistical Model for Popular Events Tracking in Social Communities. Lin et al, KDD 2011 This paper address a method to observe and track the popular events or topics that evolve over time in the communities.
- A study on retrospective and online event detection. Yang et al, SIGIR 98 This paper addresses the problems of detecting events in news stories.
- Temporal and information flow based event detection from social text streams. Zhao et al, AAAI 07 This paper addresses the problems of detecting events in news stories.
- Automatic Detection and Classification of Social Events. Agarwal and Rambow, ACL 10 This paper aims at detecting and classifying social events using Tree kernels.
- Detecting controversial events from Twitter. Popescu and Pennacchiotti, CIKM 10 This paper addresses the task of identifying controversial events using Twitter as a starting point.
- Information credibility on twitter. Castillo et al, WWW 11 The authors develop a general approach to change-point detection that generalize across wide range of application.