U got flu? Bio-surveillance, networks and twitterPosted: November 28, 2011
Twitter is emerging as a popular source of data for scientists — see various twitter-related arXiv articles here. For example, here’s a piece validating the Dunbar number by looking at social interactions among 1.7 million people on Twitter (now published in PLoS ONE). At orgtheory.net I posted about a recently published Science piece attempting to measure aggregate mood by analyzing millions of tweets.
Here’s a set of papers studying twitter and health-related issues. One paper suggests that monitoring the Twittersphere makes “bio-surveillance” possible – OMG U got flu? Analysis of shared health messages for bio-surveillance.
Here’s the abstract:
Background: Micro-blogging services such as Twitter offer the potential to crowdsource epidemics in real-time. However, Twitter posts (‘tweets’) are often ambiguous and reactive to media trends. In order to ground user messages in epidemic response we focused on tracking reports of self-protective behaviour such as avoiding public gatherings or increased sanitation as the basis for further risk analysis. Results: We created guidelines for tagging self protective behaviour based on Jones and Salath\’e (2009)’s behaviour response survey. Applying the guidelines to a corpus of 5283 Twitter messages related to influenza like illness showed a high level of inter-annotator agreement (kappa 0.86). We employed supervised learning using unigrams, bigrams and regular expressions as features with two supervised classifiers (SVM and Naive Bayes) to classify tweets into 4 self-reported protective behaviour categories plus a self-reported diagnosis. In addition to classification performance we report moderately strong Spearman’s Rho correlation by comparing classifier output against WHO/NREVSS laboratory data for A(H1N1) in the USA during the 2009-2010 influenza season. Conclusions: The study adds to evidence supporting a high degree of correlation between pre-diagnostic social media signals and diagnostic influenza case data, pointing the way towards low cost sensor networks. We believe that the signals we have modelled may be applicable to a wide range of diseases.