COVID Results Skewed By Faulty Data Import
Written by Alex Denham   
Monday, 05 October 2020

The official number of coronavirus cases in the UK has been under-reported by 16,000 during recent days - because of a data import error. In addition to the figures being skewed, people who had tested positive weren't notified, meaning their contacts also went unnotified.

Public Health England, a UK governmental department, said that 15,841 cases between 25 September and 2 October were left out of the UK daily case figures. The missing cases were added back at the weekend, causing an apparent spike in case numbers.

corona

The problem has now been resolved, according to Public Health England. Their interim chief executive Michael Brodie said that a "technical issue" was identified overnight on Friday, 2 October in the process that transfers Covid-19 positive lab results into reporting dashboards. This was caused by some data files reporting positive test results exceeding the maximum file size.

News outlets and social media have reported that the problem arose when an Excel spreadsheet reached its maximum file size, meaning no further rows could be added. This scenario has the results from labs carrying out Covid tests automatically entering the figures into spreadsheets, then those spreadsheets being sent to a central PHE facility to be collated. Because Excel spreadsheets are limited in the maximum number of rows, while CSV files aren't, if a CSV file is opened the data values beyond the Excel maximum are truncated.

If that was the case, it would be quite shocking that a government department was trying to run a major data analysis on a spreadsheet. I'm not saying it wouldn't happen and doesn't happen, but for something of this magnitude?

A (hopefully more likely) view is that what actually happened was a script to import CSV data into something other than Excel timed out. The sources reporting this say the fix was simply to set the timeout parameter to something suitably massive. The Press Association reports that the data files have been split into several smaller subfiles to overcome the problem. Whichever version is correct, the problem shouldn't recur.

Either way, it's a reminder to developers everywhere. Error trapping and reporting can make the difference between a private aargh, let's run that again', and far-too-public reproaches.

corona 

More Information

Public Health England Website

Related Articles

What Skills Do Data Scientists Need

Programmer's Guide To Theory - Error Correction

End Manual Data Entry in Excel - Thanks AI!

Excel Adds New Data Types 

John Conway Dies From Coronavirus

Fighting Coronavirus At Home With Exascale Power

Smartphone App Borrows Power For Corona Virus Research  

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

Banner


JetBrains Data Science IDE Now Open To All
20/09/2021

JetBrains is opening up its new IDE for data scientists so anyone can try it out. Until now its only been available for invites to a private early access program. 



Understand Gradle In Half An Hour
14/09/2021

Gradle, the build automation tool for multi-language software development that is widely used for Android is notoriously difficult to use. To help, here's a YouTube playlist on "Understanding Gra [ ... ]


More News

square

 



 

Comments




or email your comment to: comments@i-programmer.info

Last Updated ( Monday, 05 October 2020 )