Compilers, Interpreters, VMs and JIT
Written by Mike James   
Thursday, 21 December 2017
Article Index
Compilers, Interpreters, VMs and JIT
Two approaches

Battle lines

Now we have two ways of implementing a high level language – we can compile it to machine code or we can run it on an interpreter for which it IS the machine code.

Traditionally the argument for and against the two approaches goes something like this:

  • A compiler produces “tight efficient code”.
    This is supposed to mean that because it generates machine code everything happens as fast as possible. Of course this is nonsense because the machine code could make a lot of use of run time subroutines and so start to slide towards the interpreter approach.
  • A compiler produces “small stand-alone code”.
    Clearly if it uses a run time library then it isn’t stand-alone unless the library is included in the compiled code when it isn’t going to be small!
  • Conversely an interpreter is said to be slow and wasteful of memory space. 
    In fact an interpreter doesn’t have to be slow and a high-level language version of a program can be a lot smaller than a fully compiled machine code version.

So what is the truth?

The fact is that in the past implementations that have described themselves as compilers have been faster than ones that were called interpreters but there has always been a considerable overlap between the two.

Over time the two approaches have tended to become even more blurred and they have borrowed ideas from one another.

For example, the first generation of interpreters usually had excellent debugging facilities. Because the machines they ran on were implemented in software it was an easy task to provide additional facilities that would tell the programmer what was going on.

Interpreters invented idea such as tracing, i.e. following the execution of the program line by line, and dynamic inspection of variable contents etc. Yes this is where Basic's Tron and Troff originated from and from here all breakpoint and trace technology.

As time went on it became clear that these facilities could be built into a compiler as well by augmenting the run time environment to include them. Many compilers will produce a debug version of the code while you are still testing things and a production version when you have finished.

It is true, however, that in the past interpreted languages were more sophisticated and complex then compiled languages. The reason was simply that writing an interpreter seemed to be an easier thing to do than writing a compiler and so the implementation method chosen tended to limit, or expand, the language according to what was perceived as difficult or easy.

It is strange to think that the reason that compilers have seemed to be harder to write might well be the way that they are taught a part of formal grammar. Translating one machine language into another involves the use of a grammar and parsing techniques to work out the structure and this can be made very mathematical and so off putting to many. An interpreter on the other hand is a machine, a soft machine, that has a particular high level language as its machine code and this seems so much more like engineering.

If you make a language static and strongly typed then it seems to be easier to implement using a compiler approach. On the other hand if you use an interpreted approach then its natural to allow the language to be dynamic and allow self modification.

Today we are in a period where static languages such as Java and C# are giving ground to dynamic languages such as Ruby and even JavaScript.

These differences are not new and in many ways they represent the re-emergence of the compiler v interpreter approach to language design and implementation.

Virtual Machines And Intermediate Languages

There is one last development of the interpreter idea that is worth going into in more because it is important today.

An alternative to implementing a machine that runs the high-level language as its machine code is to compile the high-level language to a lower-level language and then run this using an interpreter or VM.  This is madness referred to at the end of the first page.

That is instead of writing an interpreter to run Java we first compile it to a simpler language called byte code. Notice we do not compile it to machine code and byte code is still fairly high level compared to machine code. To actually run the Java we use an interpreter or virtual machine for byte code. 

This might seem like a very strange idea in that you now have the worst of all possible worlds.

You have to use a compiler to translate the program from one language to another and then you have to use an interpreter to run it.

What could possibly be good about this idea?

The answer is a great deal.

The first advantage is that a compiler from a high-level language to an intermediate-level language is easier to write and can be very efficient.

The second is that an interpreter for an intermediate-level language is easier to write and can also be very efficient.

Looking at things another way we get the best, not the worst, of both approaches!

In addition there is one huge advantage which you might not notice at first. If the interpreter for the intermediate-level language is simple enough then it can be easily implemented on any hardware and this makes programs compiled to the intermediate-level code easily portable between different types of hardware.

If you are really clever then you even write the compiler in the intermediate-level language making it portable as well!

In this mode the interpreter is generally called a Virtual Machine or VM. 

That is we generally call a VM that works directly with a high level language an Interpreter. Hence Basic was generally executed by an interpreter. However if the VM runs an intermediate code produced by a compiler we generally call it a VM. Thus Java is executed by a VM and not an interpreter.

This is all the difference amounts to. 

The intermediate language is also generally called Pseudo Code, or P-Code for short. P-Code compilers and VMs were very popular in the time before the IBM PC came on the scene (USCD Pascal and Fortran being the best known). Then they more or less vanished, only to return with in a big way with Java but renamed "byte code".

 

ucsdp

 

Java’s main claim to fame is that it is the ultimate portable language.

Java VMs exist for most hardware platforms and up to a point you really can compile a Java program and expect it to run on any machine that has a VM. Not only this but the Java compiler and all of the Java system is itself compiled to byte code and so once you have a VM running on new hardware you also have the entire Java system – clever!

.NET languages such as C# and Visual Basic also use an intermediate language and VM approach but due to Microsoft's proprietary approach to computing neither is quite as portable as Java although with the open sourcing of .NET this is changing very rapidly. You can now find good implementations of the CLR and the entire .NET system on Linux and other operating systems.

This idea is so good that you can expect most language development in the future to be centred on the VM idea. One thing is sure - the future is virtual.

JIT and Not so JIT

The story so far is easy enough to understand. At one end of the spectrum of language implementations we have the pure compiler, which generates nothing but machine code and uses no run time library or package.

At the other end we have the interpreter, which generates no machine code and is all run time package in the form of a complete VM for the language.

Of course, in the real world these really are two ends of the spectrum and real compilers use different amounts of run time library, so slowly sliding towards the interpreter end of the spectrum. But what about interpreters? Do they have another way of sliding towards the compiler end of the spectrum?

An interpreter can generate some machine code to get a job done quicker if this is important. A modern VM will use all sorts of techniques to make it faster and more efficient. For example, for each instruction in the intermediate language the VM could ask which is going to be quicker: to call a routine or to generate machine code to get the job done. This approach is often called Just-In-Time or JIT compilation. It is usually explained as the VM compiling the intermediate language just before it is run, but this isn't really a good way to think of it.

The VM does compile the intermediate language, but mostly what it produces is just lots of calls to routines that constitute a runtime package. So the JIT is a sort of mixture of interpreting the code and compiling the code according to what makes best use of the real machine. 

This is still a source of lots of arguments in the programming world - is it a compiler, an interpreter, a JIT or what?

In practice there is a lot of overlap, but it is still true that languages at the compiler end of the spectrum run faster than languages at the interpreter or VM end of the spectrum. However, the gap isn't as wide as you might think and a lot depends on how well the compiler and VM are implemented. When it comes to efficiency and performance of implementing a language the devil is in the detail rather than the bigger choices.

 

Related Articles

The Heart Of A Compiler

Assemblers and assembly language

Grammar and Torture

John Backus - the Father of Fortran

Brackets are Trees

CPU

Computer Memory and Pigeonholes

Hexadecimal

How Memory Works

Inside the Computer - Addressing

Inside the processor

The Essence Of Programming

Variables revisited

 

raspberry pi books

 

Comments




or email your comment to: comments@i-programmer.info

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

Banner


Principles Of Execution - The CPU

The real complexity of any computer system resides in the processor, but do you know how it works? I mean how it really works? How does the code that you write turn into something that does something? [ ... ]



Codd and His Twelve Database Rules

Theories of how we should organize databases are thin on the ground. The one exception is the work of E.F. Codd, the originator of the commandment-like “Codd’s Rules”. This approach to database  [ ... ]


Other Articles

 

 

<ASIN:052182060X>

<ASIN:1558609105>

<ASIN:1852339691>

<ASIN:0133260445>

<ASIN:1556229038>

<ASIN:0071350934>



Last Updated ( Wednesday, 31 March 2021 )