Difference between revisions of "Castillo 2011"
Line 43: | Line 43: | ||
*Newsworthy topic assessment (Mechanical Turk): Two type: NEWS, CHAT | *Newsworthy topic assessment (Mechanical Turk): Two type: NEWS, CHAT | ||
*Credibility assessment (Mechanical Turk): Four type: i) almost certainly true, (ii) likely to be false, (iii) almost certainly false, and (iv) “I can’t decide” | *Credibility assessment (Mechanical Turk): Four type: i) almost certainly true, (ii) likely to be false, (iii) almost certainly false, and (iv) “I can’t decide” | ||
− | === Automatic | + | === Automatic Credibility Analysis === |
Four types of features depending on their scope: message-based features, | Four types of features depending on their scope: message-based features, | ||
user-based features, topic-based features, and propagation- | user-based features, topic-based features, and propagation- | ||
based features. | based features. | ||
− | '''Message-based features''' consider characteristics of messages, | + | *'''Message-based features''' consider characteristics of messages, |
these features can be Twitter-independent or Twitterdependent. | these features can be Twitter-independent or Twitterdependent. | ||
Twitter-independent features include: the length | Twitter-independent features include: the length | ||
Line 55: | Line 55: | ||
features such as: if the tweet contains a hashtag, and | features such as: if the tweet contains a hashtag, and | ||
if the message is a re-tweet. | if the message is a re-tweet. | ||
− | '''User-based features''' consider characteristics of the users | + | *'''User-based features''' consider characteristics of the users |
which post messages, such as: registration age, number of | which post messages, such as: registration age, number of | ||
followers, number of followees (“friends” in Twitter), and the | followers, number of followees (“friends” in Twitter), and the | ||
number of tweets the user has authored in the past. | number of tweets the user has authored in the past. | ||
− | '''Topic-based features''' are aggregates computed from the | + | *'''Topic-based features''' are aggregates computed from the |
previous two feature sets; for example, the fraction of tweets | previous two feature sets; for example, the fraction of tweets | ||
that contain URLs, the fraction of tweets with hashtags and | that contain URLs, the fraction of tweets with hashtags and | ||
the fraction of sentiment positive and negative in a set. | the fraction of sentiment positive and negative in a set. | ||
− | '''Propagation-based features''' consider characteristics related | + | *'''Propagation-based features''' consider characteristics related |
to the propagation tree that can be built from the retweets | to the propagation tree that can be built from the retweets | ||
of a message. These includes features such as the | of a message. These includes features such as the |
Revision as of 21:40, 25 September 2012
Castillo http://www.ra.ethz.ch/cdstore/www2011/proceedings/p675.pdf
Contents
Citation
@inproceedings{conf/www/CastilloMP11,
author = {Carlos Castillo and Marcelo Mendoza and Barbara Poblete}, title = {Information credibility on twitter}, booktitle = {WWW}, year = {2011}, pages = {675-684}, ee = {http://doi.acm.org/10.1145/1963405.1963500},
}
Abstract from the paper
We analyze the information credibility of news propagated through Twitter, a popular microblogging service. Previous research has shown that most of the messages posted on Twitter are truthful, but the service is also used to spread misinformation and false rumors, often unintentionally. On this paper we focus on automatic methods for assessing the credibility of a given set of tweets. Specifically, we analyze microblog postings related to “trending” topics, and classify them as credible or not credible, based on features extracted from them. We use features from users’ posting and re-posting (“re-tweeting”) behavior, from the text of the posts, and from citations to external sources. We evaluate our methods using a significant number of human assessments about the credibility of items on a recent sample of Twitter postings. Our results shows that there are measurable differences in the way messages propagate, that can be used to classify them automatically as credible or not credible, with precision and recall in the range of 70% to 80%.
Online version
Summary
Data Collection
- Automatic Event Detection (Twitter Monitor:http://www.twittermonitor.net/): tweets matching the query during a 2-day window centered on the peak of every burst. Each of these sub-sets of tweets corresponds to a topic. Over 2,500 such topics are collected.
- Newsworthy topic assessment (Mechanical Turk): Two type: NEWS, CHAT
- Credibility assessment (Mechanical Turk): Four type: i) almost certainly true, (ii) likely to be false, (iii) almost certainly false, and (iv) “I can’t decide”
Automatic Credibility Analysis
Four types of features depending on their scope: message-based features, user-based features, topic-based features, and propagation- based features.
- Message-based features consider characteristics of messages,
these features can be Twitter-independent or Twitterdependent. Twitter-independent features include: the length of a message, whether or not the text contains exclamation or question marks and the number of positive/negative sentiment words in a message. Twitter-dependent features include features such as: if the tweet contains a hashtag, and if the message is a re-tweet.
- User-based features consider characteristics of the users
which post messages, such as: registration age, number of followers, number of followees (“friends” in Twitter), and the number of tweets the user has authored in the past.
- Topic-based features are aggregates computed from the
previous two feature sets; for example, the fraction of tweets that contain URLs, the fraction of tweets with hashtags and the fraction of sentiment positive and negative in a set.
- Propagation-based features consider characteristics related
to the propagation tree that can be built from the retweets of a message. These includes features such as the depth of the re-tweet tree, or the number of initial tweets of a topic.
Background
What's the interesting in this paper
Related Papers
Study Plan
Papers you may want to read to understand this paper.