Compilers, Interpreters, VMs and JIT
Written by Mike James   
Thursday, 09 May 2024
Article Index
Compilers, Interpreters, VMs and JIT
The Interpreter
Virtual Machines And Intermediate Languages

The Interpreter

The idea of using a run time library to implement multiplication and similar operations doesn’t seem like a deep philosophical principle – but it is!

What we have done is to notice that the high level language makes use of a facility that the machine doesn’t actually have.

The run time subroutine can be thought of as providing the missing facility by simulating it in software. If at a later date the machine suddenly gets a piece of hardware that does multiplication the compiler can be changed to make use of it by issuing a single multiply instruction instead of a call to the multiply subroutine. Alternatively the run time library can be changed to make use of the new hardware – this is very flexible.

Now think a little harder about his idea and you quickly see that it can be extended to greater things.

The multiplication routine is there to make up for the lack of a multiply command in the real hardware.

Why not simulate all of the hardware you need and make the run time library a software machine that runs the high-level language in question.

After all A*B is an instruction in say Fortran or Basic and the run time can simply do the multiplication as soon as it encounters the instruction. The same is true for all of the instructions written in the high-level language. Each instruction results in a subroutine being called to do what ever it is the instruction is about. 

Now we don't have an obvious translation step from the high level language to machine code. In fact no machine code is generated. All that happens is that the "run time" package now reads the high level language and obeys its instructions by calling appropriate subroutines. 

That is, the run time library has become a computer, implemented in software, which has the high-level language as its machine code.

This is an amazing idea and another name for the simulation is an interpreter.

In the early days languages such as Basic were implemented using interpreters and no compile step was required. The Basic Interpreter read each of the Basic instruction and did what it was told to do, usually by calling predefined routines. 

One of the earliest to have an impact was Palo Alto Tiny Basic. A miracle of economy by today's standards. A complete integer only Basic interpreter that used just 2K of an 8 bit processors memory. It worked by using a big selection statement to call the appropriate subroutine to handle whatever keyword was next LET, GOTO etc. The actual Basic was never converted to machine code it simply activated the subroutines in the runtime library now more correctly called an interpreter.

The Virtual Machine

However notice that there is another way to look at the interpreter code. It is a software implementation of a machine that runs the high-level language as its "machine code". This is what we generally refer to as a Virtual Machine or VM. 

In the case of Tiny Basic the interpreter can be viewed as a simulation of a machine that has Tiny Basic as its machine code or assembly language.

This idea can be generalized and you can design VMs that have a lower level language than Tiny Basic as their machine code. Such a VM can be used as a target for a range of languages with the help of a compiler that translates the high level language into the VMs machine code. And at this point you might think that world has gone mad - a compiler that compiles to a made up machine code and then run on a VM in the style of an interpreter! Surely this is crazy?!

 

<ASIN:0470177071>

<ASIN:1292024348>

<ASIN:0321486811>

<ASIN:0471976970>

Battle lines

Now we have two ways of implementing a high level language – we can compile it to machine code or we can run it on an interpreter for which it IS the machine code.

Traditionally the argument for and against the two approaches goes something like this:

  • A compiler produces “tight efficient code”.
    This is supposed to mean that because it generates machine code everything happens as fast as possible. Of course this is nonsense because the machine code could make a lot of use of run time subroutines and so start to slide towards the interpreter approach.
  • A compiler produces “small stand-alone code”.
    Clearly if it uses a run time library then it isn’t stand-alone unless the library is included in the compiled code when it isn’t going to be small!
  • Conversely an interpreter is said to be slow and wasteful of memory space. 
    In fact an interpreter doesn’t have to be slow and a high-level language version of a program can be a lot smaller than a fully compiled machine code version.

So what is the truth?

The fact is that in the past implementations that have described themselves as compilers have been faster than ones that were called interpreters but there has always been a considerable overlap between the two.

Over time the two approaches have tended to become even more blurred and they have borrowed ideas from one another.

For example, the first generation of interpreters usually had excellent debugging facilities. Because the machines they ran on were implemented in software it was an easy task to provide additional facilities that would tell the programmer what was going on.

Interpreters invented idea such as tracing, i.e. following the execution of the program line by line, and dynamic inspection of variable contents etc. Yes this is where Basic's Tron and Troff originated from and from here all breakpoint and trace technology.

As time went on it became clear that these facilities could be built into a compiler as well by augmenting the run time environment to include them. Many compilers will produce a debug version of the code while you are still testing things and a production version when you have finished.

It is true, however, that in the past interpreted languages were more sophisticated and complex then compiled languages. The reason was simply that writing an interpreter seemed to be an easier thing to do than writing a compiler and so the implementation method chosen tended to limit, or expand, the language according to what was perceived as difficult or easy.

It is strange to think that the reason that compilers have seemed to be harder to write might well be the way that they are taught a part of formal grammar. Translating one machine language into another involves the use of a grammar and parsing techniques to work out the structure and this can be made very mathematical and so off putting to many. An interpreter on the other hand is a machine, a soft machine, that has a particular high level language as its machine code and this seems so much more like engineering.

If you make a language static and strongly typed then it seems to be easier to implement using a compiler approach. On the other hand if you use an interpreted approach then its natural to allow the language to be dynamic and allow self modification.

Today we are in a period where static languages such as Java and C# are giving ground to dynamic languages such as Ruby and even JavaScript.

These differences are not new and in many ways they represent the re-emergence of the compiler v interpreter approach to language design and implementation.



Last Updated ( Thursday, 09 May 2024 )