Applying C - Floating Point

Written by Harry Fairhead

Tuesday, 21 May 2024

Article Index
Applying C - Floating Point
Floating Point Algorithms
Detecting Problems
Floating Point Reconsidered

Page 4 of 4

Floating Point Reconsidered

This is by no means all you need to know about floating point arithmetic. What really matters is that you don't take it for granted that you will get the right answer when you make use of it. It should now be obvious why:

float a = 0.1f;
float total;
for(int i = 0;i<1000;i++){
    total = total+a;
}    
printf("%.7f\n", total);
printf("%d\n",total == 100.0);

prints 99.9990463 and 0, 0.1. isn't exactly representable as a binary fraction it is 0.0001100110011.. This is the reason for the usual advice of "don't test floating point numbers for equality". However, in this case it is more general in that the same problem arises with the corresponding fixed point value. That is, it is more to do with binary fractions than it is to do with floating point representation.

If you think that such a small error could never make a difference, consider the error in the Patriot missile system. The system used an integer timing register which was incremented at intervals of 0.1 seconds. However, the integers were converted to decimal numbers by multiplying by the binary approximation of 0.1. After 100 hours, an error of approximately 0.3433 seconds was present in the conversion. As a result, an Iraqi Scud missile could not be accurately targeted and was allowed to detonate on a barracks, killing 28 people.

The recommended way of testing for equality between floating point values is to use something like:

if(fabsf((total - 100.0))/100.0 <=FLT_EPSILON))...

FLT_EPSILON is a macro that gives you the accuracy of a float. There are other useful constants defined in float.h. The idea is that if two numbers differ by less than the accuracy of the representation then they can be considered equal. Of course in practice numbers computed in different ways accumulate errors that are larger than the representational error. In the case of the example above with a = 0.1, the two numbers are very much further apart than FLT_EPSILON due to the inability to represent 0.1 in binary. In practice is usual to include a factor that summarizes the errors in the computation something like:

if(fabsf((total - 100.0))/100.0 <=K*FLT_EPSILON))...

To get our example to test equal, K has to be 80 or more. However, a small change and K has to be bigger. Run the loop a thousand times and compare the result to 1000 and K has to be even bigger.

The point is that there is no single way of setting a reasonable interval that works for a range of computations. You have to analyze the computation to find out what it is safe to regard as being equal. This leads us into the realm of numerical analysis.

If you applying any formula then it is always worth checking what the best way to compute it is. It is rare that the form given in a textbook is the best way to compute a quantity. For example, the mean is traditionally computed using:

float total;
int n = 1000000;
for (int i = 0; i < n; i++) {
   total = total + (float)i;
}
total = total/(float)n;
printf("%f\n",total);

This forms a total and then divides by the number of items. The problem with this is that the total gets very big and we lose precision by adding comparatively small values to it. If you try it, you will discover that instead of 500000.00 the result is 499940.375000.

Using the alternative iterative method, which keeps the size of the running estimate down:

total = 0;
for (int i = 0; i < n; i++) {
    total = total + ((float)i-total)/(i+1);
} 
printf("%f\n",total);

gives a result of 499999.500000 which is only wrong by 0.5.

There are even better methods of computing the mean - see Kahan Summation and Pairwise Summation.

In many cases you can't avoid a detailed analysis of a calculation but it helps to have an idea of why things go wrong when you are using floating point. Imagine that you are working with three significant digits. For addition everything is fine as long as the exponents allow the digits to interact. For example consider:
1.23 x 10² + 4.67 x 10³

written out like this:

  123 +
 4670
 4793

Normalizing this gives 4.79x10³ and you can see that, ignoring rounding etc, only two digits of each value "overlapped" in the sum. If the exponents differ by 4 then none of digits are involved in the sum. For example 1.23 x 10² + 4.67 x 10⁶ =

     123 +
 4670000
 4670123

and after normalizing the result we have 4.67x10⁶. Clearly for addition and subtraction if you are working with floating point numbers with a precision of d then the accuracy of adding and subtracting goes down as the difference between the exponents approaches d. This is the sense in which you need to be careful about floating point arithmetic involving large and small numbers.

There are no similar problems with multiplication and division, apart from the accumulation of errors if operations are performed in succession.

Finally, if possible always use double or larger floating point types. Whereas float has 7 decimal digits of precision, double has 15 digits and this provides useful latitude.

Summary

Floating point arithmetic is so easy to use that we simply expect an arithmetic expression to be worked out correctly – this isn’t true.
Modern floating point hardware is almost as fast as integer operations and using double precision values doesn’t have an overhead, except for division.
Floating point arithmetic can give very wrong answers if the two operands differ by a large amount. The necessary normalization can reduce non-zero quantities to zero and the loss of precision can make results close to random.
A confusing factor is the use of extended precision during a calculation to minimize this loss of precision. This always gives a result that is as accurate, or more accurate, than if extended precision wasn’t used, but it can result in quantities that are supposed to be equal not testing as equal.
Standard floating point has two special values, NaN, not a number, and inf, infinity. The rules that govern how these are used in an expression are reasonable, but not foolproof. You can get a very wrong answer without even knowing that a special value is involved.
There are some standard ways of detecting special values and problems with floating point, but only in C99 and later. In practice, the results vary according to architecture.
You can cast integer to float and float to integer types. Everything works as you would expect, but casting to an integer type that is too small to hold the integer part of a float is undefined behavior.
Implementing floating point calculations is difficult an in many cases you need to find out how other people have tackled the problem. There are often optimized ways of computing the formulae you find in text books.

Now available as a paperback or ebook from Amazon.

Applying C For The IoT With Linux

C,IoT, POSIX & LINUX
Kernel Mode, User Mode & Syscall
Execution, Permissions & Systemd
Extract Running Programs With Systemd
Signals & Exceptions
Extract Signals
Integer Arithmetic
Extract: Basic Arithmetic As Bit Operations
Extract: BCD Arithmetic
Fixed Point
Extract: Simple Fixed Point Arithmetic
Floating Point
File Descriptors
Extract: Simple File Descriptors
Extract: Pipes
The Pseudo-File System
Extract: The Pseudo File System
Extract: Memory Mapped Files
Graphics
Extract: framebuffer
Sockets
Extract: Sockets The Client
Extract: Socket Server
Threading
Extract: Pthreads
Extract: Locking ***NEW
Extract: Condition Variables
Extract: Deadline Scheduling
Cores Atomics & Memory Management
Extract: Applying C - Cores
Interupts & Polling
Extract: Interrupts & Polling
Assembler
Extract: Assembler

Also see the companion book: Fundamental C

Floating Point Numbers

Remote C/C++ Development With NetBeans

Raspberry Pi And The IoT In C

Getting Started With C/C++ On The Micro:bit

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

CodeRabbit Now Free In VSCode
14/05/2025

CodeRabbit, an AI-powered code review tool designed to automate the code review process is now integrated in VS Code, the first tool to deliver full-context reviews both in the IDE and in Git, he [ ... ]

+ Full Story

Early 2025 Java Conferences Galore Part 3
23/05/2025

We continue the lowdown on Java conferences. Having looked initally at sessions from three Voxxed events, last week we explored two Devoxx events and JavaOne. This week it's the turn of JCha [ ... ]

+ Full Story

More News

Comments

or email your comment to: comments@i-programmer.info

<< Prev - Next

Last Updated ( Tuesday, 21 May 2024 )

Recent Articles

Recent Book Reviews

Popular Articles

Floating Point Reconsidered

Summary

Now available as a paperback or ebook from Amazon.

Related Articles

Comments