Difference between revisions of "Yang et. al., SIGKDD 2012"

From Cohen Courses
Jump to navigationJump to search
Line 18: Line 18:
  
 
*Content-based
 
*Content-based
**Contains Multimedia - Whether the message contains a picture, audio or video file
+
**Contains Multimedia - Whether the message contains a picture, audio or video file.
**Sentiment - Sentiment of the message (based on the number of positive and negative emoticons used)
+
**Sentiment - Sentiment of the message (based on the number of positive and negative emoticons used).
**Contains URL - Whether the message contains an url
+
**Contains URL - Whether the message contains an url.
**Time Span - The duration between the user registration and the message that was posted
+
**Time Span - The duration between the user registration and the message that was posted.
  
 
*Account-based
 
*Account-based
**Is Verified - Whether the user is verified by Weibo
+
**Is Verified - Whether the user is verified by Weibo.
**Has Description - Whether the user has a description
+
**Has Description - Whether the user has a description.
**Gender - Gender of the user
+
**Gender - Gender of the user.
**Avatar Type - Type of the avatar of the user (malicious users generally have the default avatar)
+
**Avatar Type - Type of the avatar of the user (malicious users generally have the default avatar).
**N followers - number of followers
+
**N followers - number of followers.
**N friends - number of friends
+
**N friends - number of friends.
**N messages - number of posts
+
**N messages - number of posts.
**Registration time - the time from the creation of the account and the present
+
**Registration time - the time from the creation of the account and the present.
**User Type - Type of the account (Non-organization users have a higher change of being malicious)
+
**User Type - Type of the account (Non-organization users have a higher change of being malicious).
**Registration Place - Physical location where the account was created
+
**Registration Place - Physical location where the account was created.
  
 
*Propagation-based
 
*Propagation-based
**Is Retweeted - Whether the message was original or a retweet
+
**Is Retweeted - Whether the message was original or a retweet.
**N Comments - Number of comments of the message
+
**N Comments - Number of comments of the message.
**N Retweets - Number of retweets of the message
+
**N Retweets - Number of retweets of the message.
  
 
The new proposed features are:
 
The new proposed features are:
*Client Program - The type of program used (mobile-client or web-client)
+
 
*Event Location - Location where the event occured (this work distinguishes from events occured in China or not in China)
+
*Client Program - The type of program used (mobile-client or web-client).
 +
 
 +
*Event Location - Location where the event occured (this work distinguishes from events occured in China or not in China).The rationale behind this is that messages describing events that occured outside China are more likely to be rumors.

Revision as of 20:32, 29 September 2012

Citation

Fan Yang, Yang Liu, Xiaohui Yu, and Min Yang. 2012. Automatic detection of rumor on Sina Weibo. In Proceedings of the ACM SIGKDD Workshop on Mining Data Semantics (MDS '12). ACM, New York, NY, USA.

Online version

Automatic detection of rumor on Sina Weibo

Summary

This Paper addresses the problem of Rumor Detection in Weibo (equivalent to Twitter in Mainland China). A classifier is trained using manually annotated data (identified false rumors) from a rumor-busting service available in Sina Weibo.

This study uses features applied in previous work on Twitter Rumor Detection and analyses the different effects of these features in Weibo. Furthermore, it proposes new features that improve the overall results.

Features

The set of features considered contain several features from previous work.

  • Content-based
    • Contains Multimedia - Whether the message contains a picture, audio or video file.
    • Sentiment - Sentiment of the message (based on the number of positive and negative emoticons used).
    • Contains URL - Whether the message contains an url.
    • Time Span - The duration between the user registration and the message that was posted.
  • Account-based
    • Is Verified - Whether the user is verified by Weibo.
    • Has Description - Whether the user has a description.
    • Gender - Gender of the user.
    • Avatar Type - Type of the avatar of the user (malicious users generally have the default avatar).
    • N followers - number of followers.
    • N friends - number of friends.
    • N messages - number of posts.
    • Registration time - the time from the creation of the account and the present.
    • User Type - Type of the account (Non-organization users have a higher change of being malicious).
    • Registration Place - Physical location where the account was created.
  • Propagation-based
    • Is Retweeted - Whether the message was original or a retweet.
    • N Comments - Number of comments of the message.
    • N Retweets - Number of retweets of the message.

The new proposed features are:

  • Client Program - The type of program used (mobile-client or web-client).
  • Event Location - Location where the event occured (this work distinguishes from events occured in China or not in China).The rationale behind this is that messages describing events that occured outside China are more likely to be rumors.