ChatGPT And Excel Another Coding Threat?
Written by Mike James   
Wednesday, 06 September 2023

We have been considering the role of coding copilots in helping skilled programmers create code, but what happens when large language models attempt to create a spreadsheet? Is this just another way to get things wrong?

Two recent studies have been considering the role, threats and dangers of using Chat GPT to construct spreadsheets and guess what - they agree that hallucinations are a problem.

Patrick O'Beirne presented a paper to European Spreadsheet Risks Interest Group that does a good job of reminding us that spreadsheets are a form of code that is really not under any reasonable control. Many spreadsheet creators are undertrained and use resources similar to StackOverflow and ask questions on the web to build their sheets. Why not add asking GPT or other Large Language Models (LLMs) about how to do the job?


The paper describes how to add a GPT query box and then tries it out with some simple experiments from which 10 "lessons" are derived. What is interesting is that most of the responses to the requests were only wrong in minor ways and often they corrected themselves when the error was queried in some way.

The ten lessons are:

1: Test the formula suggested, even if it looks right to you – or in other words, you don't see any problem at first.

2: If it does not work, try again. Give it more context, say what does not work.

3: Start over again with a fresh session and ask the same question to see if you get the same answer.

4: If you know of a newer function that you think might work, suggest it in your prompt.

5: When you copy down a formula, ensure that the references are correct in both relative and absolute forms.

6: Do more than one test and verify each result.

7: It's best to really understand how the formula works so that you can decide whether it is a good match for your needs.

8: When you get a formula, ask what could go wrong with it and check each potential issue.

9: When you get a formula you can't understand, ask a human expert to explain it.

10: Remember, these are language models, not logical or mathematical models.

My opinion is that to apply these lessons requires more knowledge than the typical asker of the inital questions would have.

The second study is from Simon Thorne at Cardiff Metropolitan University and this starts off with a correct example:

The grade is based on an average of two cells, if the average is 70 or greater then award a 1st, if the average is equal to 60 and less than than 69 then award a 2:1, if the average is equal to 50 and less than 59 award a 2:2, if the average is equal to 40 and less than 49 then award a third, otherwise award a fail.

The above text prompt produces the following accurate spreadsheet formula:

=IF(AVERAGE(A1,B1)>=70,"1st", IF(AVERAGE(A1,B1)>=60,
"2:1", IF(AVERAGE(A1, B1)>=50, "2:2",
IF(AVERAGE(A1, B1)>=40, "3rd", "Fail"))))

From here we have two research questions:

Research Question 1: How does ChatGPT perform code generation when it is required to solve an incompletely described problem?

Research Question 2: What underlying knowledge and competence does ChatGPT have in logic, deduction and inference?

Two hypotheses were explored and the experiments demonstrated that the more uncertain the prompt the more likely an incorrect result. The uncertainties can be characterised as incomplete information and the amount of deduction and inference needed to get the right answer. It seems that ChatGPT has little idea of BODMAS, for example.

The lack of knowledge of the order of arithmetic operators leads on to the second hypothesis that ChatGPT is no good at logic. Basically if the logic is in the prompt then it tends to make use of it. If it is missing then ChatGPT uses a sort of vague and inprecise logic to make up for it.

A comment on the BODMAS problem is very illuminating:

" was realised that it was consistently able to provide a correct calculation for volume because it is highly likely that the formulae for volume is explicitly in the large language corpus on which ChatGPT is trained. So it’s able to cite and calculate volume correctly because it has already learnt the explicit way in which it should be calculated. However, when it is required to apply BODMAS in a situation not explicitly covered in the training of the model, it is unable to use the same principles correctly.."

The conclusion is telling and worrying:

"Where the prompt is offered in complete detail, in these circumstances ChatGPT provided consistently correct code. However, as these experiments have show, if there is any uncertainty, inference or deduction needed from the prompt, ChatGPT has questionable ability to provide accurate code or reasoning. This opens a new front in spreadsheet risks, those that arise from the use of LLM generated spreadsheet formulae."

I forsee a whole new level of spreadsheets producing misleading results when used beyond their initial design data. It's a sort of extrapolation error that isn't likely to be caught by simple-minded testing on the data used to motivate the LLM prompt. For example, a spreadsheet formula that works on a single cell isn't usually correct when copied down a column for more results. As novice spreadsheet users discover that they can get more done with the help of LLMs, I think the number of problems is going to increase.

More Information

ChatGPT and Excel -- trust, but verify

Experimenting with ChatGPT for Spreadsheet Formula Generation: Evidence of Risk in AI Generated Spreadsheets

Related Articles

Is Excel To Blame For Our Economic Pain?        

Companies That Use Spreadsheets Survive       

Spreadsheet Risk Revealed   

End Manual Data Entry in Excel - Thanks AI! 

Spreadsheets Are Special

Human Genes Renamed To Please Excel

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.



Master Large Language Model Ops

New technology brings with it more career opportunities. You may never have imagined becoming an LLMOps consultant,  but there's now a Coursera Specialization which provides preparation for this  [ ... ]

Spider Courtship Decoded by Machine Learning

Using machine learning to filter out unwanted sounds and to isolate the signals made by three species of wolf spider has not only contributed to an understanding of arachnid courtship behavior, b [ ... ]

More News

raspberry pi books



or email your comment to:


Last Updated ( Wednesday, 06 September 2023 )