Page 1 of 2
This article is more or less everything that the working programmer should know about operators and their associated expressions and, of course, the use of parentheses.
The main trouble with operators is that we know the four arithmetic operators so well that we tend to take them for granted and so miss most of the important ideas.
If you think of the instructions found in any almost any computer language they fall into two groups - commands and the operator expressions.
A command is something like CLEARSCREEN, OPEN, ERASE, PRINT etc. which causes a single action to occur.
The best known example of an operator expression is the arithmetic expression, e.g.
At first sight this looks like a single command to perform arithmetic but it isn't. In fact it is a small program in its own right.
It is composed of smaller commands and it has a flow of control.
This idea, that an expression is a small program, is remarkably obvious if you write in assembler because most assemblers don't support arithmetic expressions and so the expression program has to be written out using standard commands.
For example, using a single register machine the assembler equivalent of A=3+B*C would be
LOADREG B load the register from memory location B
MULTREG C multiply the register contents by C
ADDREG 3 add 3 to the register
STOREREG A store the contents of the register in A
This is much more clearly a little subroutine to work something out than A=3+B*C is but they are entirely equivalent.
Any compiler writer will tell you that the most difficult bit of any compiler is the translation of expressions into good code. Well at any rate it used to be but now that the theory is reasonably well understood it's more or less a text book exercise.
What makes operator expressions interesting is that they have a complex set of rules for determining the flow of control through the component parts of the expression.
For example, in the expression 3+B*C all programmers know that the * operation is done before the + operation but that doesn't correspond to the order that they are written in.
In a simple left to right reading of the expression the + should come first and you do get different results depending on the order that the instructions are obeyed.
In a program the order of execution is usually from top to bottom and/or from left to right so expressions really are different.
The basic idea is that each operator has a precedence associated with it and the order that each operation is carried out depends on this precedence.
In the example above * has a higher precedence than + and so it is executed first. Notice that this precedence can also be seen in the assembly language version of the expression where the last part of the expression was evaluated first.
Of course you can always use parentheses or brackets to explicitly control the grouping of the operators - a bracketed expression will be treated as the single value that it evaluates to.
One subtle point is that although most languages evaluate expressions according to precedence rules most do not guarantee the order of evaluation of sub-expressions.
For example, in evaluating the expression
the compiler can choose to work out (1+2) or (3+4) first. Of course in this case it makes no difference to the result but there are exceptions - specifically when the sub-expressions have side effect i.e. change the state of the program.
This is an example of "undefined behaviour" which is currently a hot topic in the C/C++ world where there is quite a lot of it.
So far this is about as much as most programmers know about expressions but there is a little more.
For example, an operator can operate on different numbers of data values.
The common arithmetic operations are dyadic, that is they operate on two values as in 1+2 but there are also plenty of monadic operators that operate on a single value such as -2, NOT truth value, 2^4 and so on.
In general an operator can be n-adic without any difficulty apart from how to write them as part of expressions. For example, if I invent the triadic operator @ which finds the largest of three values the only reasonable way to write this is as
@ value1, value2, value3
and in this guise it looks more like a function than an operator.
This is because there is a very close association between functions and operators.
Functions and operators
Put simply you can say that an operator is simply a function that has a priority associated with it.
For example, rather than the usual arithmetic operators we could easily get by with the functions ADD(a,b), SUB(a,b), MULT(a,b) and DIV(a,b) as long as they had the same priorities assigned to them.
The question of how to write operator expressions neatly has exercised the minds of many a mathematician. The usual notation that suits dyadic operators, i.e. A+B is called infix notation but it doesn't generalise to n-adic operators.
A better, but less familiar, notation is reverse Polish where the operator is written after its operands.
For example, AB+ is reverse Polish for A+B. The advantage of reverse Polish is that it does generalise to n-adic operators.
For example, ABC@ is a reverse Polish expression using the triadic operator that finds the maximum of A,B and C.
Reverse Polish has another advantage in that operator priorities can be represented by the order that they occur.
For example, AB+C* is clearly (A+B)*C because the multiplication cannot be evaluated until the addition is evaluated to provide the second operand for the multiplication.
Because an operator can look like a function it is possible to confuse the two.
For example, in Ada abs, i.e. taking the absolute value of a quantity, is an operator but in Pascal and most other languages it is a function. If you don't notice this subtle difference you could end up writing expressions in Ada or Pascal that don't work as you expect.
As well as not agreeing about whether something is an operator or a function languages don't always agree about the assignment of priorities and if you switch languages this can be a cause of serious and difficult to find bugs.
For example, Fortran and BASIC give the exponentiation operator ^ the highest precedence but most spreadsheets give it second place to the unitary minus -.
It isn't even always reasonable to stick to a simple precedence rule in all cases.
For example, in many languages the exponentiation operation has the highest precedence of all operators and in general this is reasonable but consider
using the strict precedence this should be
i.e. -16 but this clearly isn't what the programmer intended. To put things right many languages alter the priority of exponentiation when it is immediately followed by negation. That is 4^-2 is evaluated as 4 raised to the power of -2 (i.e. 1/16).
Operators and expressions are powerful and compact ways of writing programs and many languages cannot resist the temptation to use them for nearly everything.
The most often quoted example of an operator based language is APL but surprisingly it doesn't use an operator precedence at all and simply evaluates everything left to right!
Operators in C
In my opinion the operator king of modern languages is C and its approach has been inherited by many of the languages that are based on it.
C has a range of operators that needs 16 levels of precedence to control!
This bewildering array of operators is one of the reasons why C is intimating to the beginner and can look cryptic if written using lots of operators.
Once you start to extend the range of operators that a language has then you have to worry about associativity as well as precedence.
Associativity is simply the order that operators of the same priority will be evaluated. If you restrict your attention to simple arithmetic operators then the associativity isn't a problem because (A+B)+C is the same as A+(B+C).
However when you move on to consider more general operators it does matter.
For example, in C and many other languages (C# for example) the right shift operator A>>B shifts A right by B bits and
which means shift A right by B+C isn't the same as
shift A right by the result of the right shift of B by C bits.
To make the result of such operations unambiguous C defines each operator as associating left to right or right to left i.e. right or left associative.
The >> operator associates left to right because this means that A>>B>>C is (A>>B)>>C and this is the same as A>>(B+C).