Spreadsheets Are Special
Written by Mike James   
Tuesday, 29 July 2014

Don't dismiss the spreadsheet. It not only brought programming to the masses but was an early entry into rapid application development, functional programming and an easy and natural approach to parallel programming, reactive programming and cellular automata.

I've been asked more than once why I Programmer bothers to cover spreadsheets. They are not, after all, obviously anything much to do with programming?




That is where you would be wrong. The spreadsheet is one of the most amazing inventions of the computer age. It not only makes programming easy for non-programmers, it also impacts on some of our most pressing problems - how to create parallel programs. If  you think that reactive programming is a hot topic then think again because spreadsheets invented reactive programming. 




The spreadsheet isn't an obvious idea.

It may seems so now because a whole generation of programmers and users have grown up knowing not only the basic idea but experiencing it first hand. Want to add a few numbers - use a spreadsheet. It seems so obvious that it almost doesn't need inventing. As one whos programming career spans the invention of the spreadsheet I can tell you that before it was invented it wasn't obvious in the least. 

When spreadsheets were first introduced they were taken up by users who had been looking for something to unlock the power of the machine that they had bought into without the need to hire a programmer or even learn a programming language.

Spreadsheets met a need that few even guessed existed.

Today spreadsheets are often relegated to the task of ad-hoc data presentation by way of creating graphs and charts. Useful though this occupation is the real power of the spreadsheet lies in the two words "ad-hoc". You could say that spreadsheets were the first step towards rapid application development and perhaps even agile programming - but this might be a claim too far.




Loops and flow of control

Before the spreadsheet if you wanted to get a computer to do a simple sum you had to program it.

After all the first high level programming language was called Fortran - FORmula TRANslation. You could use other tools put together by programmers to do sums - like a command line calculator - but it was the spreadsheet that first allowed innocent users to type in mathematical formulae and have the computer work it out for them.

This was good but there is more.

If you needed to total a column of numbers before the spreadsheet you needed to know how to write a for loop and to work with an array. This is sophisticated stuff. But, after the spreadsheet, you could simply type in a formula that said something like Sum(A1:A10).




Now where did that for loop and the whole concept of iteration go?

The answer is that it is hidden in the range reference A1 to A10 and the function applied to it. Clever stuff but at the time spreadsheets were being invented no one thought about it like this. No one sat back and said

"I can get rid of iteration from programming by inventing the spreadsheet Sum function". 

But this is what it did as Sum(A1:A10) is equivalent to:

For i= A1 to A10
Next i

Users who had no idea what an enumeration loop was were very happy to write spreadsheet sum functions. 

In the same way the branching if statement was replaced by a simple IF function which returned one of two possible expressions depending on the condition. 

For example


evaluates to 0 if A1 is zero and A2/A1 otherwise. Notice that this is exactly like the ternary operator ? e.g. A1==0?0,A2/A1 gives the same result. 

This hid from view the full idea of the flow of control splitting into two possible paths at each and every if statement.

However, you mustn't think that the spreadsheet version was in some way weaker because it only evaluated to one of two possible expressions. Because other cells in the spreadsheet could reference the result of an IF function it really did create a split in the flow of control.

Two different chains of computation could result from each IF function.

It has to be admitted that the spreadsheet way of implementing the conditional has never been as successful as its way of dealing with the loop. Ranges and functions seem simpler but IF functions are almost as complicated as if statements and while most spreadsheet users found summing a column trivial mastering conditional functions was more of a problem. 

In other word even the most novice spreadsheet user quickly discovered and used range functions like Sum they tended to be slower to figure out what the if function was all about. Even when they did discover the conditional they often used it in very limited ways. Oddly in the spreadsheet world it is the conditional that is the difficult idea - not iteration. 

The CA way to parallel programming

The use of a range reference and a function to get rid of the for loop is just the start.

An adventurous spreadsheet user can write formulas that refer to formulae stored in other cells. This raises the level of sophistication to the point were it reaches the forefront of computer science research.

The problem is that if you have formulae in different cells and one cell depends on the result of another then the result will depend on the order that you evaluate them in.

In other words a set of self-referencing formulas produces a race hazard and even the possibility of circular dependency.

At first spreadsheet implementers solved this problem by simply imposing some rules that gave a known order of evaluation - usually top to bottom and left to right.

Spreadsheet users quickly learned to think in terms of "everything above and to the left is evaluated and safe to use".

This may seem like avoiding the issue but even this simple approach could cope with circular references that needed an iterative approach to evaluate.

The spreadsheet would simply recalculate itself repeatedly in the same order until it was either stopped or until the calculation converged to a solution.

Later spreadsheets took a more reasonable and efficient approach to re-computation and created a dependency graph that allowed the expressions to be evaluated in a natural order. This didn't eliminate the need to iterate to evaluate circular references but it did mean that you could write formulas that depended on other results anywhere in the spreadsheet and expect the system to work out everything in a sensible order.

Today we have rediscovered this approach in reactive programming. In a reactive language you set up relations between variables that work exactly like spreadsheet references. If you write A=B+C then variable A is updated every time B or C change their value. You are establishing a long term relationship between A, B and C rather than a one off update as in procedural programming. In reactive programming we use different jargon but essentially what is being reinvented is the spreadsheet approach to programming. 

What is less obvious is that the spreadsheet brings with it a natural approach to parallel programming. Each formula entered into a cell is to be evaluated in place and display its current result in that cell. This is an inherently parallel algorithm. If you could build the hardware equivalent of a spreadsheet with one processor per cell then all you would have to do is ensure that each processor evaluated just the formula in its cell and display the result.

In this case the order of evaluation doesn't matter as long as the processors always re-compute until the results settle to a stable state. While this would not be a good way to implement a software emulation of the spreadsheet concept it is a very reasonable way for real hardware to be used - the settling time would be very short. As it happens I was involved in a project to build a machine exactly like this and its settling time was fast enough for real time changes to be made. 

This whole approach to parallel processing is of course nothing more than a cellular automaton (CA).

The CA is possibly the most important computational architecture we have yet to explore. It has been suggested that a CA can give us the theory of everything we have been looking for.

If you don't believe that CAs are capable of great complexity then just take a look at the Game of Life - a 2D CA with amazingly simple rules that generates very complex behaviour. The idea is that something like the simple rules for Life are at the heart of the universe which simply gets on computing those rules and on the way generates all there is.

The spreadsheet is a CA and it is an advanced computational model that turns out to be surprisingly easy to use.

Whatever you think of the spreadsheet don't dismiss it as trivial.






or email your comment to: comments@i-programmer.info









Last Updated ( Tuesday, 29 July 2014 )