The Heart Of A Compiler
Written by Mike James   
Thursday, 01 April 2021
Article Index
The Heart Of A Compiler
Formula translation
A Solution

Compilers are an essential part of using a computer - but there was a time when they simply didn't exist. First we had to realize that we needed such a thing and then we had to figure out how to build it.

If you have ever looked at the theory of building a compiler you might feel that it is a difficult task. The reality is that compiling arithmetic is surprisingly difficult.

What Programmers Know



  1. The Computer - What's The Big Idea?*
  2. The Memory Principle - Computer Memory and Pigeonholes*
  3. Principles of Execution - The CPU
  4. The Essence Of Programming
  5. Variables - Scope, Lifetime And More*
  6. Binary Arithmetic
  7. Hexadecimal*
  8. Binary - Negative Numbers*
  9. Floating Point Numbers*
  10. Inside the Computer - Addressing
  11. The Mod Function
  12. Recursion
  13. The Lost Art Of The Storage Mapping Function *
  14. Hashing - The Greatest Idea In Programming
  15. Advanced Hashing
  16. XOR - The Magic Swap*
  17. Programmer's Introduction to XML
  18. From Data To Objects*
  19. What Exactly Is A First Class Function - And Why You Should Care*
  20. Stacks And Trees*
  21. The LIFO Stack - A Gentle Guide*
  22. Data Structures - Trees
  23. Inside Random Numbers
  24. The Monte Carlo Method
  25. Cache Memory And The Caching Principle
  26. Data Compression The Dictionary Way
  27. Dates Are Difficult*
  28. Sequential Storage*
  29. Magic of Merging*
  30. Power of Operators
  31. The Heart Of A Compiler*
  32. The Fundamentals of Pointers
  33. Functional And Dysfunctional Programming*

* Recently revised

In the early days of computing the main preoccupation was with efficiency and the best way to make a machine run fast was, and is, to make careful use of its architecture and this was possible using machine code.

A programmer would know how the machine worked and would carefully craft code to achieve a result that took minimum time and/or memory. Every byte was manually allocated and managed and so was the use of the machine's internal registers where the work was done.

The big problem is that working in machine code is particularly unfriendly and makes creating programs time-consuming and error-prone.

Put simply programming in machine code required a machine expert and there were lots of people who wanted to make use of a computer who didn't have the time or enthusiasm to become that much of an expert on a particular machine.

It didn’t take long for programmers to realize that the machine that they were programming could be used to make their task easier. After all the computer is a symbolic machine and what were the programmer of the time doing other than moving symbols around.

Programmers generally wrote code using symbols such as ADD and then used a lookup table, on paper or memorized, to convert the symbols to machine code and the addresses of the memory locations they were using. This process of converting the symbols to binary was at first entirely manual but it didn't take long for some one to notice that this was a job a computer could do. The simple mnemonic codes such as ADD and so were formalized and the lookup table that used to be on paper was converted into a program. This was the invention of the  assembler and assembly language.

The key important point to note is that with a simple assembler there is a one-to-one correspondence between the mnemonic codes and the machine code that the assembler generated. All we are doing is using symbols to represent the numeric codes that make the computer operate. This was a huge advance in productivity but not so much in terms of theory or sophistication.



The lookup table converts mnemonics to machine code.

Moving to macro-assemblers

Programmers were very happy in the transition from machine code to assembler because there was a clear one-to-one correspondence between instructions in machine code and assembler.

That is, when a programmer wrote a line of assembler it was very easy to see how this corresponded to the machine code that would be generated from it.

This allowed programmers to still take account of how they created programs to best use the architecture of the machine by hand-crafting the assembly code. Even today if you want maximum performance from a section of code it is possible to switch to assembler, although this is becoming increasingly rare. Modern compilers have become good at optimizing the machine code that they generate and the skills of the original assembly language programmers are rare.

The step from machine code to assembler seemed a very small and innocent one but in fact it was enough to start the floodgates opening.

The very fact that there was a translation step, i.e. from assembly language to machine code, allowed the programmers who designed the assembly languages to add features.

The first major development was the “macro-assembler”.

Programmers found that they were often writing the same block of instructions over and over. For example, to add two numbers together they might write something like:

MOV AL,number1
ADD AL.number2
MOV number3,AL

which means move the contents of memory location number1 into the AL register, then add to AL the contents of number2 and store the result in number3.

In case you have forgotten in a machine registers are like memory locations that you can do arithmetic and logical operations on. There are generally so few of them that they are assigned names like A,B and so on. In this case AL is the Low part of the A register.

Each time this block of code is used the only thing that changes is the location of number1, number2 and number3. To avoid having to write this out each time a macro assembler allowed the programmer to define a macro. For example the ADD3 macro would be defined something like:

ADD3 a,b,c

Then whenever you wrote something like:


in the assembly language program, the assembler would expand this to be:


Notice that this isn’t anything particularly clever. The assembler simply keeps a table of macro definitions in memory somewhere and when it sees ADD3 X,Y,Z in the program it looks up ADD3 in the table and substitutes the definition of ADD3 into the program as if the programmer had bothered to write it all out in full.

A macro is simply a shorthand form that programmers can invent as they go along.

Today all assemblers are macro assemblers and there is no need to make a distinction.

Of course the macro even made it into higher-level programming languages in the form of code snippets or even full macros complete with parameters.

Notice that creating macros is a way of extending the basic machine code language that the basic assembler supports. Give a good and standard macro library the programmer can start to think in a language that is "bigger" than the underlying machine code. This is the start of the abstraction of languages and of software in general away from the low level machine hardware.

Also notice that a macro instruction breaks the close connection between the symbolic assembler i.e. the mnemonics and the underlying machine code. Now a single macro assembler instruction can correspond to multiple machine instructions. We are starting to think differently about how programs are put together.

For a period in computer science the macro processor was an important theoretical concept and practical tool but it was just a small step on the road to a full high-level language.





Last Updated ( Thursday, 01 April 2021 )