|Google Updates Flu Model|
|Written by Janet Swift|
|Monday, 18 November 2013|
Google has updated its Flu prediction model that uses search activity for information about flu to give accurate realtime estimates of the occurrence of the disease.
Since 2008 Google has been monitoring search behavior related to flu. It uses keyword trends from Google.com to produce a daily estimate of the occurrence of flu two weeks in advance of publication of official surveillance data. Its Flu Trends website presents a world map that reflects the severity of flu outbreaks in around 30 countries.
This video provides an overview of how this works and shows that Google's realtime prediction generally correlates closely with statistics collected by agencies such as the US Centre for Disease control whose statistics typically have a two-week time lag.
Last year, however, things went wrong, see Google Flu Prediction - Beware The Media Effect, when Google Flu Trends began to overestimate the incidence of flu in the United States.
As reported on the Official Google blog, the explanation for this was found to be:
heightened media coverage on the severity of the flu season resulted in an extended period in which users were searching for terms we’ve identified as correlated with flu levels. In early 2013, we saw more flu-related searches in the US than ever before.
In other words, media attention prompted people who were not suffering from flu symptoms to search out of "general interest" resulting in an unprecedented error in Google's estimate.
This graph shows the way in which media coverage and the error rate were correlated:
(click to open larger version)
A paper, Google Disease Trends: An Update, outlines the steps taken to improve its prediction algorithm for both Flu and Dengue fever. It states:
We’ve experimented with two areas of improvement: 1) dampening anomalous media spikes and 2) using ElasticNet.
With regard to the former Google is using independent measures of flu in the news media to modulate the contribution of certain flu-related queries during estimation. The second area of investigation "addresses the absence of explicit coefficients for query terms in the model" and the team has made improvements to the regression algorithms to handle large numbers of query terms.
Having adjusted the search aggregation method, Google had applied the new model to the historical data provided by the U.S. Centres for Disease Control and shows a near perfect fit, although it still slightly overpredicts the 2012-13 flu levels.
(click to open larger version)
Compared to its predecessor the new model indicates a lower level of flu. At November 10th all but one states in the US are currently reporting Low flu activity and only Mississippi has reached Moderate status. On the previous model Moderate would have prevailed by now. However, we are just at the start of the 2013-14 flu season and only time will tell if and when flu activity reaches High or Intense levels.
To be informed about new articles on I Programmer, install the I Programmer Toolbar, subscribe to the RSS feed, follow us on, Twitter, Facebook, Google+ or Linkedin, or sign up for our weekly newsletter.
or email your comment to: firstname.lastname@example.org
|Last Updated ( Monday, 18 November 2013 )|