About Weiboscope

Weiboscope is a Chinese social media data collection and visualization project which is developed by the

research team at the Journalism and Media Studies Centre, The University of Hong Kong. One project

objective, among many, is to make censored Sina Weibo posts of a selected group of Chinese

microbloggers publicly accessible. Since January 2011, the system has been regularly sampling timelines

of a set of selected Chinese microbloggers who have more than 1,000 followers or whose posts are

frequently censored.


In year 2012, Weiboscope collected 226 million weibo posts, among which more than 10.9 million were

no longer publicly accessible because of either being censored by the authorities or being deleted

voluntarily by the user. The Year 2012 weibo dataset is available here.


Besides the selective sampling approach, we also deploy random sampling technique (Fu and Chau,

2013) to generate a set of representative samples of Sina Weibo users. The result is presented here.


Weiboscope has developed a number of data analytical and visualization tools, including censorship

index time trend analysis and term cloud analysis. We have been using the data and tools for

undertaking a variety of research projects, including analyzing weibo censorship and its mechanism,

evaluating the impact of the "real name registration" policy in China, social media and health issues in

China, and topic modeling.


JMSC's Weiboscope project has been covered by international and local media, such as CNN, The

Economist, Al Jazeera, Wall Street Journal, The Guardian, The Global Times, The Telegraph (Australia),

Apple Daily, BBC Chinese channel, as well as the HKU Bulletin.

Methodology

We use Sina Weibo's Open API to access raw microblog data. Since 2010, we have compiled a list of

popular microbloggers that the inclusion criterion for which was 1,000 or more followers. Moreover, our

system identifies a list of microbloggers whose posts are frequently censored. We combine both lists

and generate an updated list of microbloggers. We then collect the posts of each microblogger on the

list and then save the messages in our database. Each user's recently modified timeline is compared to

the immediately previous version. If posts are missing in the new copy compared with the old copy and

the returned API error message is "permission denied", they are classified as "censored posts". For

detailed explanation, please check this.


Weiboscope website is updated two times an hour. The most recent and highly retweeted posts are

displayed. Moreover, censorship index is presented to show the time trend of the extent of censorship

taken place on the Sina Weibo samples. Geographical distribution of censorship index illustrates the

provincial variation of the censorship rate in different locations. Censored pictures and censored term

cloud are two data visualization tools to give visual perspectives for data presentation and analysis.

Publications

Fung, I. C.-H., Fu, K.-W., Ying, Y., Schaible, B., Hao, Y., Chan, C.-H., & Tse, Z. T.-H. (2013). Chinese

social media reaction to the MERS-CoV and avian influenza A(H7N9) outbreaks. Infectious Diseases of ,

Poverty, 2(1), 31.

Fu, K.W., Cheng, Q., Wong, P. W. C., & Yip, P. S. F. (2013). Responses to a Self-Presented Suicide

Attempt in Social Media. Crisis: The Journal of Crisis Intervention and Suicide Prevention, 34(6), 406-

412. doi: 10.1027/0227-5910/a000221

White, J., Fu, KW, & Benson, B. (2013) Social media: An ill-defined phenomenon. Vol. 8029 (pp. 422-

431): Lecture Notes in Computer Science.

Fu KW, Chan CH, Chau M. Assessing Censorship on Microblogs in China: Discriminatory Keyword

Analysis and the Real-Name Registration Policy. Internet Computing, IEEE. 2013; 17(3): 42-50.

Fu, KW, Chau M (2013) Reality Check for the Chinese Microblog Space: A Random Sampling Approach.

PLoS ONE 8(3): e58356. doi:10.1371/journal.pone.0058356

Conference presentations

Fu K.W. (2012) A Fully Automated Method to Catch and Characterize Deleted Posts on Sina and Tencent

Weibo. Paper presented at the Chinese Internet Research Conference, May 21-22, USC Annenberg

School for Communication and Journalism


Fu K.W. (2011) Micro Blogging Suicide in China: Data Mining and Exploratory Analysis. Paper presented

at the 26th World Congress - International Association of Suicide Prevention. September 13-17, Beijing,

China, 2011

Research Team

Principal Investigator:King-wa Fu

Members:CH Chan (2012-now)

               Cedric Sam (2010-2012)

Logo design:Pasu Au Yeung

Acknowledgement

This project is funded by the HKU Knowledge Exchange Fund.