Difference between revisions of "Yang et. al., SIGKDD 2012"

From Cohen Courses
Jump to navigationJump to search
 
(8 intermediate revisions by the same user not shown)
Line 44: Line 44:
 
*Client Program - The type of program used (mobile-client or web-client).
 
*Client Program - The type of program used (mobile-client or web-client).
  
*Event Location - Location where the event occured (this work distinguishes from events occured in China or not in China).The rationale behind this is that messages describing events that occured outside China are more likely to be rumors.
+
*Event Location - Location where the event occured (this work distinguishes from events occured in China or not in China). The rationale behind this is that messages describing events that occured outside China are more likely to be rumors.
 +
 
 +
== Experiments ==
 +
 
 +
Tests were performed using [[UsesMethod::SVM]] with RBF kernal function. Different tests were performed to see whether the new features can improve the results over the previous features. This is done by running the classifier with the set of features defined in previous work and then once more after adding the new features. The evaluation is conducted using Accuracy, which is defined as the percentage of false rumors that were detected successfully.
 +
 
 +
{| class="wikitable" border="1"
 +
|-
 +
!
 +
! Context-based
 +
! Account-based
 +
! Propagation-based
 +
|-
 +
| (W/ new features)
 +
| 78.01%
 +
| 77.36%
 +
| 78.66%
 +
|-
 +
| (W/o new features)
 +
| 72.58%
 +
| 72.63%
 +
| 72.34%
 +
|}
 +
 
 +
Results show that the classification accuracy using all three sets of previous features can be improved using these new features.
 +
 
 +
== Related papers ==
 +
 
 +
This paper uses sets of features proposed in previous work:
 +
 
 +
* The work in [[RelatedPaper::Castilo et al WWW 2011]] uses message-based features, user-based features, topic-based features and propagation-based features.
 +
* The work in [[RelatedPaper::Qazvinian et al EMNLP 2011]] uses context-based features, network-based features and Twitter specific memes. These last ones where studied more in detail in [[RelatedPaper::Ratkiewicz et al CoRR 2010]].
 +
 
 +
== Study plan ==
 +
While most the work is easily understandable, a reader might consider revising:
 +
* [[SVM]]

Latest revision as of 15:59, 2 October 2012

Citation

Fan Yang, Yang Liu, Xiaohui Yu, and Min Yang. 2012. Automatic detection of rumor on Sina Weibo. In Proceedings of the ACM SIGKDD Workshop on Mining Data Semantics (MDS '12). ACM, New York, NY, USA.

Online version

Automatic detection of rumor on Sina Weibo

Summary

This Paper addresses the problem of Rumor Detection in Weibo (equivalent to Twitter in Mainland China). A classifier is trained using manually annotated data (identified false rumors) from a rumor-busting service available in Sina Weibo.

This study uses features applied in previous work on Twitter Rumor Detection and analyses the different effects of these features in Weibo. Furthermore, it proposes new features that improve the overall results.

Features

The set of features considered contain several features from previous work.

  • Content-based
    • Contains Multimedia - Whether the message contains a picture, audio or video file.
    • Sentiment - Sentiment of the message (based on the number of positive and negative emoticons used).
    • Contains URL - Whether the message contains an url.
    • Time Span - The duration between the user registration and the message that was posted.
  • Account-based
    • Is Verified - Whether the user is verified by Weibo.
    • Has Description - Whether the user has a description.
    • Gender - Gender of the user.
    • Avatar Type - Type of the avatar of the user (malicious users generally have the default avatar).
    • N followers - number of followers.
    • N friends - number of friends.
    • N messages - number of posts.
    • Registration time - the time from the creation of the account and the present.
    • User Type - Type of the account (Non-organization users have a higher change of being malicious).
    • Registration Place - Physical location where the account was created.
  • Propagation-based
    • Is Retweeted - Whether the message was original or a retweet.
    • N Comments - Number of comments of the message.
    • N Retweets - Number of retweets of the message.

The new proposed features are:

  • Client Program - The type of program used (mobile-client or web-client).
  • Event Location - Location where the event occured (this work distinguishes from events occured in China or not in China). The rationale behind this is that messages describing events that occured outside China are more likely to be rumors.

Experiments

Tests were performed using SVM with RBF kernal function. Different tests were performed to see whether the new features can improve the results over the previous features. This is done by running the classifier with the set of features defined in previous work and then once more after adding the new features. The evaluation is conducted using Accuracy, which is defined as the percentage of false rumors that were detected successfully.

Context-based Account-based Propagation-based
(W/ new features) 78.01% 77.36% 78.66%
(W/o new features) 72.58% 72.63% 72.34%

Results show that the classification accuracy using all three sets of previous features can be improved using these new features.

Related papers

This paper uses sets of features proposed in previous work:

Study plan

While most the work is easily understandable, a reader might consider revising: