CheckCell Detects Bugs In Spreadsheets
Written by Janet Swift   
Monday, 27 October 2014

A free Excel  plug-in that takes a new approach to automatically finding data errors in spreadsheets has been released. Rather than looking for errors in programs and formulas, CheckCell sets out to debug the underlying data.

The tool, called CheckCell comes from Daniel Barowy and Dimitar Gochev, computer science doctoral students at the University of Massachusetts Amherst. It was launched at this year's international computer programming language design conference, OOPSLA.

It addresses the problem that spreadsheet data errors can be consequential referring to the story we reported in April 2013, see Is Excel To Blame For Our Economic Pain?.

In his conference session Daniel Barowy told delegates:

"Consider the case of a paper written by Harvard economists Carmen Reinhart and Kenneth Rogoff a couple of years ago. The paper was influential, lending credibility to government austerity measures in Europe and the United States. But in 2013, UMass Amherst economist Thomas Herndon and colleagues found, in combing through the data by hand, that methodological errors undermined Reinhart and Rogoff's argument. In particular, Reinhart and Rogoff exaggerated the impact of key data values in a spreadsheet."

The CheckCell group, supervised by Prof Emery Berger. wondered whether software might be developed to find these kinds of errors automatically and successfully devised an approach that found a number of the same errors uncovered by Herndon.

The paper presented at OOPSLA refers to the approach as data debugging, which combines program analysis and statistical
analysis to automatically find potential data errors by locating data that has a disproportionate impact on the computation. 

Putting  it more informally Barowy explains:

"Basically, CheckCell identifies data points that have a big impact on the final result, even if the impact is super subtle and difficult to detect. CheckCell immediately flags data points that are very suspicious, the ones that deserve a second look. It's like having a helper who says, 'pay attention to these cells, they really matter.'"

The technique uses what Berger calls "a threshold of unusualness." CheckCell marks high-impact data points in red and asks the spreadsheet designer to check them. If they are indeed correct, they turn green and will not be flagged in subsequent analyses.

The example cited involves getting the digits in a data point transposed, so when a teacher has an “A” student who would be expected to get a 94 on a test and the spreadsheet says that student got a 49, CheckCell would flag it.  A similar example is used in the paper: 

 checkcelleg

Isn't this just range checking of the sort we have always told users to build into spreadsheets? 

I think it is fair to say it is a valuable extension of the idea that isn't as coarse grained as manual checks would be and by working from the unexpected outcome to the data brings a welcome degree of automation to the problem.

Berger says that in the future the team plan to extend CheckCell's use to large-scale data sets where errors that have a large impact are difficult to find. Having already shown that CheckCell would have uncovered some of the flaws in the Reinhart/Rogoff data, this seems a worthwhile endeavor. 

You can install CheckCell 1.0.2 for Excel 10 and later via GitHub and its source code is also available. gCheckCell, a version for Google Spreadsheet is also on GitHub.

 

Banner


Data Wrangler Gets Copilot Integration
11/11/2024

Microsoft has announced that Copilot is being integrated into Data Wrangler. The move will give data scientists the ability to use natural language to clean and transform data, and to get help with fi [ ... ]



Lightbend Announces Akka 3
15/11/2024

Lightbend, the company that developed Akka, has announced Akka 3, and has changed its name to Akka. The company produces cloud-native microservices frameworks, and Akka is used for building distribute [ ... ]


More News

 

espbook

 

Comments




or email your comment to: comments@i-programmer.info

Last Updated ( Thursday, 06 November 2014 )