COVID Results Skewed By Faulty Data Import
Written by Alex Denham   
Monday, 05 October 2020

The official number of coronavirus cases in the UK has been under-reported by 16,000 during recent days - because of a data import error. In addition to the figures being skewed, people who had tested positive weren't notified, meaning their contacts also went unnotified.

Public Health England, a UK governmental department, said that 15,841 cases between 25 September and 2 October were left out of the UK daily case figures. The missing cases were added back at the weekend, causing an apparent spike in case numbers.

corona

The problem has now been resolved, according to Public Health England. Their interim chief executive Michael Brodie said that a "technical issue" was identified overnight on Friday, 2 October in the process that transfers Covid-19 positive lab results into reporting dashboards. This was caused by some data files reporting positive test results exceeding the maximum file size.

News outlets and social media have reported that the problem arose when an Excel spreadsheet reached its maximum file size, meaning no further rows could be added. This scenario has the results from labs carrying out Covid tests automatically entering the figures into spreadsheets, then those spreadsheets being sent to a central PHE facility to be collated. Because Excel spreadsheets are limited in the maximum number of rows, while CSV files aren't, if a CSV file is opened the data values beyond the Excel maximum are truncated.

If that was the case, it would be quite shocking that a government department was trying to run a major data analysis on a spreadsheet. I'm not saying it wouldn't happen and doesn't happen, but for something of this magnitude?

A (hopefully more likely) view is that what actually happened was a script to import CSV data into something other than Excel timed out. The sources reporting this say the fix was simply to set the timeout parameter to something suitably massive. The Press Association reports that the data files have been split into several smaller subfiles to overcome the problem. Whichever version is correct, the problem shouldn't recur.

Either way, it's a reminder to developers everywhere. Error trapping and reporting can make the difference between a private aargh, let's run that again', and far-too-public reproaches.

corona 

More Information

Public Health England Website

Related Articles

What Skills Do Data Scientists Need

Programmer's Guide To Theory - Error Correction

End Manual Data Entry in Excel - Thanks AI!

Excel Adds New Data Types 

John Conway Dies From Coronavirus

Fighting Coronavirus At Home With Exascale Power

Smartphone App Borrows Power For Corona Virus Research  

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

Banner


Google Introduces JPEG Coding Library
15/04/2024

Google has introduced Jpegli, an advanced JPEG coding library that maintains high backward compatibility while offering enhanced capabilities and a 35% compression ratio improvement at high quality co [ ... ]



Spider Courtship Decoded by Machine Learning
07/04/2024

Using machine learning to filter out unwanted sounds and to isolate the signals made by three species of wolf spider has not only contributed to an understanding of arachnid courtship behavior, b [ ... ]


More News

raspberry pi books

 

Comments




or email your comment to: comments@i-programmer.info

Last Updated ( Monday, 05 October 2020 )