Python Script Invalidates Hundreds Of Papers
Written by Mike James   
Monday, 21 October 2019

This news item is interesting not just because it is a lesson to us all, but because of the way it is being reported as "Bug In Python Script ..." with the suggestion that Python is the cause of the problem. The truth is, in fact, much more interesting.

The script is about 1000 lines of Python and hence it isn't a small program. It has been in use since 2014 and was created by  Patrick Willoughby, Matthew Jansma, and Thomas Hoye to take raw data and calculate NMR shifts. In the journal Nature Protocols the subject is referred to as the "Willoughby-Hoye" scripts. 

Everything was going fine until researchers at the University of Hawaii noticed that they got different results when the same data and scripts were run on different operating systems. Windows and a version of MacOS gave the correct answers, but another version of MacOS and Ubuntu gave incorrect results. This is not the sort of behavior you expect from a bug - bugs usually misbehave everywhere not just in certain environments.


So what was the problem?

The problem was tracked down to the way the data was being retrieved.The data from each run was stored in two files. The files were retrieved by file name in pairs and processed in pairs. The problem was that the order in which the files were being retrieved varied according to the operating system. As long as the pairs of files matched up you got the correct result. If they didn't you processed data from two different runs.

The culprit in many of the news reports is claimed to be the Python standard library module glob, which will return a list of files matching a file/path specification including wildcards. It works, but it didn't return the results in a specified order. Is this a bug? Not really. The documentation for glob starts with the sentence:

"The glob module finds all the pathnames matching a specified pattern according to the rules used by the Unix shell, although results are returned in arbitrary order."

So it is hardly a bug. What is it then? Causal users, programmers whose task isn't really programming, often make assumptions about what is reasonable and don't think about the consequences of not writing defensive code. Could the documentation have done everyone a service by putting the "are returned in arbitrary order" in bold? Would it have been more reasonable to impose an order rather than leave it as undefined behavior? The problem is compounded by some operating systems appearing to return a sorted list, so confirming the users assumptions. Should the operating system have put more effort into returning the list in an arbitrary order? 

Perhaps the problem isn't really anyone's fault in the sense that undefined behavior is always a tar pit waiting for the unsuspecting user to be sucked down. In this case the reason for not returning a sorted list is efficiency - why sort a list if many users don't need it sorted.?

You could say that better programming skills would have helped. For example, including some tests would probably have caught the problem, but would it? If you are going to write a test you need to be aware of your assumptions and if the programmers were aware of this critical assumption they would have noticed the first sentence in the documentation and avoided it. It takes a lot of discipline to write tests for things you know must be true.

If you want to reflect on the good news, you can take the attitude that science seems to work. The error was found and papers are being retracted/corrected.

How do we reduce the chance of this happening again?


  • Mike James is the author of Programmer's Python: Everything is an Object published by I/O Press as part of the  I Programmer Library. With the subtitle "Something Completely Different" this is for those who want to understand the deeper logic in the approach that Python 3 takes to classes and objects.



More Information

Characterization of Leptazolines A–D, Polar Oxazolines from the Cyanobacterium Leptolyngbya sp., Reveals a Glitch with the “Willoughby–Hoye” Scripts for Calculating NMR Chemical Shifts

Related Articles

Which Languages Are Bug Prone

Reboot Your Dreamliner Every 248 Days To Avoid Integer Overflow

MIT Finds Overflow Bugs       

Code Digger Finds The Values That Break Your Code       

Toyota Code Could Be Lethal        

Robot cars - provably uncrashable?       

Do cars have bugs?       

Debugging and the Experimental Method       

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.



Udacity Offers More AWS Scholarships

Udacity has announced it is accepting applications for the next wave of 1,000 AWS AI & ML Scholarships. Any student over the age of 16 who self-identifies as under-served or under-represented in t [ ... ]

Apache NiFi Adds Python Processor Support

Apache NiFi 2, a project for processing and distributing data, has been released with support for Python processors in the MiNiFi framework, and a completely rebuilt user interface.

More News

kotlin book



or email your comment to:




Last Updated ( Monday, 21 October 2019 )