Yang et. al., SIGKDD 2012
Citation
Fan Yang, Yang Liu, Xiaohui Yu, and Min Yang. 2012. Automatic detection of rumor on Sina Weibo. In Proceedings of the ACM SIGKDD Workshop on Mining Data Semantics (MDS '12). ACM, New York, NY, USA.
Online version
Automatic detection of rumor on Sina Weibo
Summary
This Paper addresses the problem of Rumor Detection in Weibo (equivalent to Twitter in Mainland China). A classifier is trained using manually annotated data (identified false rumors) from a rumor-busting service available in Sina Weibo.
This study uses features applied in previous work on Twitter Rumor Detection and analyses the different effects of these features in Weibo. Furthermore, it proposes new features that improve the overall results.
Features
The set of features considered contain several features from previous work.
- Content-based
- Contains Multimedia - Whether the message contains a picture, audio or video file.
- Sentiment - Sentiment of the message (based on the number of positive and negative emoticons used).
- Contains URL - Whether the message contains an url.
- Time Span - The duration between the user registration and the message that was posted.
- Account-based
- Is Verified - Whether the user is verified by Weibo.
- Has Description - Whether the user has a description.
- Gender - Gender of the user.
- Avatar Type - Type of the avatar of the user (malicious users generally have the default avatar).
- N followers - number of followers.
- N friends - number of friends.
- N messages - number of posts.
- Registration time - the time from the creation of the account and the present.
- User Type - Type of the account (Non-organization users have a higher change of being malicious).
- Registration Place - Physical location where the account was created.
- Propagation-based
- Is Retweeted - Whether the message was original or a retweet.
- N Comments - Number of comments of the message.
- N Retweets - Number of retweets of the message.
The new proposed features are:
- Client Program - The type of program used (mobile-client or web-client).
- Event Location - Location where the event occured (this work distinguishes from events occured in China or not in China). The rationale behind this is that messages describing events that occured outside China are more likely to be rumors.
Experiments
Tests were performed using SVM with RBF kernal function. Different tests were performed to see whether the new features can improve the results over the previous features. This is done by running the classifier with the set of features defined in previous work and then once more after adding the new features. The evaluation is conducted using Accuracy, which is defined as the percentage of false rumors that were detected successfully.
Context-based | Account-based | Propagation-based | |
---|---|---|---|
(W/ new features) | 78.01% | 77.36% | 78.66% |
(W/o new features) | 72.58% | 72.63% | 72.34% |
Results show that