Weiboscope is a Chinese social media data collection and visualization project which is developed by the
objective, among many, is to make censored Sina Weibo posts of a selected group of Chinese
microbloggers publicly accessible. Since January 2011, the system has been regularly sampling timelines
of a set of selected Chinese microbloggers who have more than 1,000 followers or whose posts are
In year 2012, Weiboscope collected 226 million weibo posts, among which more than 10.9 million were
no longer publicly accessible because of either being censored by the authorities or being deleted
voluntarily by the user. The Year 2012 weibo dataset is available here.
Besides the selective sampling approach, we also deploy random sampling technique (Fu and Chau,
Weiboscope has developed a number of data analytical and visualization tools, including censorship
index time trend analysis and term cloud analysis. We have been using the data and tools for
undertaking a variety of research projects, including analyzing weibo censorship and its mechanism,
evaluating the impact of the "real name registration" policy in China, social media and health issues in
China, and topic modeling.
We use Sina Weibo's Open API to access raw microblog data. Since 2010, we have compiled a list of
popular microbloggers that the inclusion criterion for which was 1,000 or more followers. Moreover, our
system identifies a list of microbloggers whose posts are frequently censored. We combine both lists
and generate an updated list of microbloggers. We then collect the posts of each microblogger on the
list and then save the messages in our database. Each user's recently modified timeline is compared to
the immediately previous version. If posts are missing in the new copy compared with the old copy and
the returned API error message is "permission denied", they are classified as "censored posts". For
detailed explanation, please check this.
Weiboscope website is updated two times an hour. The most recent and highly retweeted posts are
displayed. Moreover, censorship index is presented to show the time trend of the extent of censorship
taken place on the Sina Weibo samples. Geographical distribution of censorship index illustrates the
provincial variation of the censorship rate in different locations. Censored pictures and censored term
cloud are two data visualization tools to give visual perspectives for data presentation and analysis.
Fung, I. C.-H., Fu, K.-W., Ying, Y., Schaible, B., Hao, Y., Chan, C.-H., & Tse, Z. T.-H. (2013). Chinese
social media reaction to the MERS-CoV and avian influenza A(H7N9) outbreaks. Infectious Diseases of ,
Poverty, 2(1), 31.
Fu, K.W., Cheng, Q., Wong, P. W. C., & Yip, P. S. F. (2013). Responses to a Self-Presented Suicide
Attempt in Social Media. Crisis: The Journal of Crisis Intervention and Suicide Prevention, 34(6), 406-
412. doi: 10.1027/0227-5910/a000221
White, J., Fu, KW, & Benson, B. (2013) Social media: An ill-defined phenomenon. Vol. 8029 (pp. 422-
431): Lecture Notes in Computer Science.
Fu KW, Chan CH, Chau M. Assessing Censorship on Microblogs in China: Discriminatory Keyword
Analysis and the Real-Name Registration Policy. Internet Computing, IEEE. 2013; 17(3): 42-50.
Fu, KW, Chau M (2013) Reality Check for the Chinese Microblog Space: A Random Sampling Approach.
PLoS ONE 8(3): e58356. doi:10.1371/journal.pone.0058356
Fu K.W. (2012) A Fully Automated Method to Catch and Characterize Deleted Posts on Sina and Tencent
Weibo. Paper presented at the Chinese Internet Research Conference, May 21-22, USC Annenberg
School for Communication and Journalism
Fu K.W. (2011) Micro Blogging Suicide in China: Data Mining and Exploratory Analysis. Paper presented
at the 26th World Congress - International Association of Suicide Prevention. September 13-17, Beijing,
Principal Investigator:King-wa Fu
Members:CH Chan (2012-now)
Cedric Sam (2010-2012)
Logo design:Pasu Au Yeung
This project is funded by the HKU Knowledge Exchange Fund.