Fundamental C- Side Effects, Sequence Points And Lazy Evaluation
Written by Harry Fairhead   
Monday, 11 February 2019
Article Index
Fundamental C- Side Effects, Sequence Points And Lazy Evaluation
Dangerous Expressions
Legal Expressions

It is this amazing richness of operators that makes C so attractive to anyone who has taken the time to master it. After a while you can write an expression such as:

C += A++ + B++ + (D=++E) == F

I have to admit that it would take me time to work out exactly what this expression does and then I would quickly forget it! The best advice is, don’t write expressions that are an effort to understand. Just because you have access to a dangerous weapon it doesn't mean you have to use it.

Finally, there are two operators that deserve special attention. The first is the conditional operator. This is a functional alternative to an if statement. It allows you to includes a conditional within an expression. The expression:

expression1?exression2:expression3

first evaluates expression1 and treats it as a boolean. If this is true, i.e. non-zero, then the result is expression2, but if it is false, i.e. zero, then the result is expression3.

For example:

int ans= a<0?-1:1;

sets ans to -1 if a is negative and 1 otherwise.

The advantage of the conditional expression is that it is an expression and can be used as part of a larger expression. For example:

int ans =(a<0?-1:1)*size;

sets ans to -1*size or 1*size depending on the value in a.

The comma operator, as distinct from other uses of the comma, is perhaps the most puzzling in C. The comma is only treated as an operator if it is part of an expression:

expression1, expression2;

First expression1 is evaluated and the result is thrown away. Then expression2 is evaluated and it is returned as the result.

For example:

int ans= 1+2, 3+4;

this evaluates 1+2 and ignores the result and then evaluates 3+4 and stores this in ans.

The big problem with the comma is not what it does but working out what you need it for.

The obvious answer is that you want it when an expression has a side effect that you are interested in more than its result.

For example:

int ans= i++,a+i;

first increments i and then stores a+i in ans.

You still might not be convinced of its usefulness and in this you would be right. One of its more common uses is to allow for multiple expressions within the parts of a for loop.

For example:

    for(i=1,j=9;j>0;i++,j--){
        printf("%d %d ",i,j);
    }

Notice that we initialize both i and j and that i increments and j decrements each time through the loop. The result is a loop in which i runs from 1 to 9 and j from 9 to 1. There are alternative clearer ways of writing this loop. If you find you are using the comma operator a lot, you probably need to think about how other programmers are going to understand your programs.

Sequence Points and Lazy Evaluation

We have introduced the idea of a side effect in a casual way. In principle, an expression should simply evaluate to a result and no changes should occur to any variables or the state of the program or machine. Such an expression is said to be “pure”. C has a number of “impure” operators which create side effects and the fact that you can use a function within an expression means that side effects are very likely. A function, as discussed in the next chapter, can do almost anything and change the state of the system in ways that are not at all apparent or even connected with the expressions they are used in.

When you first meet the idea of side effects they can seem simple and useful, but what does:

i=i++;

mean?

It all depends on when the increment is performed relative to the assignment. The problem is that the value of i is changed twice by the side effects of the expression and it isn’t clear what order the side effects occur in.

For many years this problem was ignored and it was left to the compiler writers to provide a definition in each case. However, for C11 an attempt was made to solve the problem but its approach isn’t really as helpful as it might be. It makes use of the idea of a sequence point – a point in the evaluation of a function when all side effects of operators so far encountered are guaranteed to be complete – and the C11 standard defines several of them.

The end of any full expression obviously has to be a sequence point – i.e. assignment, return expression, if statement, switch, while, do.. while and each expression in a for loop. The end of an initializer is also a sequence point and so is the comma operator and any function call in an expression. In addition side effects are complete before the function is called. Notice, however, that the order in which the functions are called is not always specified and the order in which parameters are evaluated is not specified.

There is a sequence point after the evaluation of each format specifier in a printf to make sure that any expressions you are evaluating don’t interact in undefined ways.

When it comes to the internal details of expression evaluation it makes sense to define some operators as sequence points.

Logical OR ||and AND && operators are sequence points because they are evaluated in a different way to other operators. For an OR, if the left-hand expression is true then the right-hand expression is not evaluated because the result is already known to be true. Similarly for an AND, if the left-hand expression is false then the right-hand expression is not evaluated because the result is already known to be false. This is called short circuit, or lazy, evaluation. In both cases, however, the left-hand expression is complete, including side effects, before the right-hand expression is evaluated. Notice that if the right-hand expression has any side effects, including throwing an error, these will not happen if the evaluation is short circuited. For example:

int result =a||b++;

b will not be incremented if a is true and in:

in result = a|| b/0;

no divide by zero error will occur if a is true.

For similar reasons, a conditional expression is a sequence point. The first expression is fully evaluated including side effects before the second or third expression is evaluated. That is the ? in a?b:c is a sequence point.

operators



Last Updated ( Sunday, 17 March 2019 )