Automatic Testing - Programmers Are Still The Problem
Written by Alex Armstrong   
Friday, 16 June 2017

White box test generator tools can help you find bugs automatically by generating combinations of inputs that are likely to give the wrong answer. However there is a problem - how do you know when you have the wrong answer?

A recent study by David Hon€ and Zoltan Micskei of Budapest University of Technology and Economics puts automatic testing under the spotlight.

White-box test generator tools rely only on the code under test to select test inputs, and capture the implementation’s output as assertions. If there is a fault in the implementation, it could get encoded in the generated tests. Tool evaluations usually measure fault-detection capability using the number of such fault-encoding tests. However, these faults are only detected, if the developer can recognize that the encoded behavior is faulty.

They carried out a study using 54 grad students and two open source projects and the IntelliTest Tool which is still better known as Pex. The tool generated test cases and the subjects were asked to identify when the output was indicative of a fault. The results were some what surprising.

ŒThe results showed that participants incorrectly classi€ed a large number of both fault-encoding and correct tests (with median misclassi€cation rate 33% and 25% respectively). Œus the real fault-detection capability of test generators could be much lower than typically reported, and we suggest to take this human factor into account when evaluating generated white-box tests.

You can see the confusion matrix for the NBitcoin project's tests:


FP=False Positive, FN=False Negative, TP=True Positive, TN=True Negative

What this means is that in practice white box testing is subject to programmer errors in its own right. By examining videos of the subjects trying to interpret the tests the researchers note that they tended to use debugging methods to further explore the code and clarify the meaning of the test. An "exit" survey also indicates the sorts of difficulties the subjects felt they had with the task:

  • “Deciding whether a test is OK or wrong when it tests an unspecifi€ed case, e.g. comparing with null, or equality of null.”

  • “Distinguishing between the variables was difficult, e.g between assetMoney, assetMoney1, assetMoney2.”

  • “Tests should compare less with null and objects with themselves.”

  • “I think that some assertions are useless, and not asserting ’real problems’, just some technical details.”

  • “Generated test cases doesn’t separated into Arrange, Act, Assert and should create more private methods for these concerns.”

  • “Generate comments into tests describing what happening.”

The key finding is:

ŒThe implication of the results is that the actual fault-€nding capabilities of the test generator tools could be much lower than reported in technology-focused experiments. 

Of course it would be better if the programmer could be taken out of the loop, but this would require the white box tester to work out what the result of the test should be and this would require a lot of AI.

More Information

Classifying the Correctness of Generated White-Box Tests: An Exploratory Study

Related Articles

Code Digger Finds The Values That Break Your Code

Code Hunt - New Coding Game From Microsoft Research

Debugging and the Experimental Method       

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.



Apple's Walled Garden Is Crumbling

Or is it? Depends on which side of the wall you are on. Apple is doing all it can to keep control and more importantly keep the revenuse from its App Store, but from the outside it seems to be losing  [ ... ]

Google Releases Vertex AI Agent Builder

Google has launched Vertex AI Agent Builder, alongside new open-source language models for the Vertex AI platform. The announcements were made at the Google Cloud Next 2024 event.

More News

raspberry pi books



or email your comment to:

Last Updated ( Friday, 16 June 2017 )