|Python Script Invalidates Hundreds Of Papers|
|Written by Mike James|
|Monday, 21 October 2019|
This news item is interesting not just because it is a lesson to us all, but because of the way it is being reported as "Bug In Python Script ..." with the suggestion that Python is the cause of the problem. The truth is, in fact, much more interesting.
The script is about 1000 lines of Python and hence it isn't a small program. It has been in use since 2014 and was created by Patrick Willoughby, Matthew Jansma, and Thomas Hoye to take raw data and calculate NMR shifts. In the journal Nature Protocols the subject is referred to as the "Willoughby-Hoye" scripts.
Everything was going fine until researchers at the University of Hawaii noticed that they got different results when the same data and scripts were run on different operating systems. Windows and a version of MacOS gave the correct answers, but another version of MacOS and Ubuntu gave incorrect results. This is not the sort of behavior you expect from a bug - bugs usually misbehave everywhere not just in certain environments.
So what was the problem?
The problem was tracked down to the way the data was being retrieved.The data from each run was stored in two files. The files were retrieved by file name in pairs and processed in pairs. The problem was that the order in which the files were being retrieved varied according to the operating system. As long as the pairs of files matched up you got the correct result. If they didn't you processed data from two different runs.
The culprit in many of the news reports is claimed to be the Python standard library module glob, which will return a list of files matching a file/path specification including wildcards. It works, but it didn't return the results in a specified order. Is this a bug? Not really. The documentation for glob starts with the sentence:
So it is hardly a bug. What is it then? Causal users, programmers whose task isn't really programming, often make assumptions about what is reasonable and don't think about the consequences of not writing defensive code. Could the documentation have done everyone a service by putting the "are returned in arbitrary order" in bold? Would it have been more reasonable to impose an order rather than leave it as undefined behavior? The problem is compounded by some operating systems appearing to return a sorted list, so confirming the users assumptions. Should the operating system have put more effort into returning the list in an arbitrary order?
Perhaps the problem isn't really anyone's fault in the sense that undefined behavior is always a tar pit waiting for the unsuspecting user to be sucked down. In this case the reason for not returning a sorted list is efficiency - why sort a list if many users don't need it sorted.?
You could say that better programming skills would have helped. For example, including some tests would probably have caught the problem, but would it? If you are going to write a test you need to be aware of your assumptions and if the programmers were aware of this critical assumption they would have noticed the first sentence in the documentation and avoided it. It takes a lot of discipline to write tests for things you know must be true.
If you want to reflect on the good news, you can take the attitude that science seems to work. The error was found and papers are being retracted/corrected.
How do we reduce the chance of this happening again?
or email your comment to: email@example.com
|Last Updated ( Monday, 21 October 2019 )|