Bamman et. al., FIRST MONDAY 2012

From Cohen Courses
Revision as of 22:11, 29 September 2012 by Lingwang (talk | contribs)
Jump to navigationJump to search

Citation

David Bamman, Brendan O'Connor and Noah A. Smith. 2012. Censorship and deletion practices in Chinese social media. In First Monday.

Online version

Censorship and Content Deletion in Chinese Social Media

Summary

This Paper attempts to characterize the practices of censorship and message deletion in Sina Weibo (Chinese counterpart of Twitter). The paper identifies three different approaches to analyse this issue.

Term Deletion Rate

To build a corpora of messages and their annotations (whether the message was deleted), the Weibo messages were queried over a period of three months. Later, it was checked if the message still existed in the present time. If not, it means that the message was deleted.

To analyse topics that are likely to be deleted, the authors calculate the term deletion rate for each term , defined as follows.

, where is the number of times a message with the term was deleted and is the number of messages with .

Furthermore, a statistical test is performed (using the one–tailed binomial p-value) to find the terms whose deletion rates are abnormally high. These terms are then analysed manually.

From these terms, the authors conclude the following. Messages containing politically sensitive items are likely to be deleted. Another type of terms are terms such as "asked to resign", which have are sentitive due to real-world events. Finally, terms that occured in false rumors also have a high deletion rate.

Comparing Twitter with Weibo

Based on the fact that messages in Twitter are not deleted as in Weibo, it is expected that the relative frequency of a terms that are likely to be deleted in Weibo, to occur much more often in Twitter. Thus, the authors propose the following metric:


The terms with the highest scores are tested in Weibo's search engine to check whether they are blocked. Results show that in the top 20 terms, 70% of the messages were blocked. The precision gets lower as we add terms with lower scores, but at the top 2000, 136 censored terms were found.