Page 2 of 3
Who Is the User?
The other big realization that Cocke's work brought to the attention of machine designers was the idea that the user of a machine instruction set wasn't the programmer but the compiler.
Back in the early days assembly language programmers hand crafted code. When they needed to perform a multiplication on a machine that only performed additions in hardware then it was a lot of extra work to either write or borrow a multiplication routine. Of course programmers wanted complex instructions they made programming in machine code easier.
Programmers prefer CISC.
Later of course programmers moved away from machine code to high level languages and to an extent their involvement with the instruction set of a computer became less. At most they might admire a machine's design almost in the abstract but when it came to generating machine code it was the compiler that was all important. The fact is that compiler writers simply didn't make use of the complex instructions on offer. They generally used a small set of instructions that were enough to get the job done. This simplified the design of the code generation portion of the compiler.
As the technology advanced the prime consumer of machine instruction sets and hence the determining factor in what was a good design was the compiler - and compilers used a small simple subset of the instructions that might be available.
Compilers prefer RISC.
If RISC is such a good idea why do we all use CISC x86 processors?
For a time the Mac used a RISC processor – the PowerPC – but even here CISC has triumphed with Apple now using Intel processors. Sun also offered Sparc based RISC machines mainly to scientists and engineers but eventually lost the struggle and was taken over by Oracle mainly for its software assets.
So overall we still use CISC.
The reason is mainly due to the need to be backward compatible with the early microprocessors and the way that processors evolve by accumulating features. Even RISC processors have a tendency to grow ever more CISC-like with each new version!
However RISC is far from dead. What has happened is that RISC design principles have been incorporated into the core of processors such as the Pentium line that leads up to today's multicore chips without changing their overall nature.
This was a technique first used by Intel’s competitors to produce chips that outperformed the early x86 family processors. NexGen was the first and was so successful that it was bought by AMD, but it didn't take long for Intel to use the same ideas to make faster Pentiums.
The way that this works is that the processor core executes a set of reduced instructions called ROPs. The standard complex instructions of the Pentium instruction set are first broken down into a sequence of ROPs and then obeyed.
Believe it or not this strange two-stage execution system is actually faster! The programmer doesn’t have to know anything about the deep architectural changes in the chip because it translates the old instruction set using yet more hardware.
This is very similar to the use of a Just In Time (JIT) compiler for a high level language that converts complex instructions into a set of simpler machine code or intermediate code instructions just before they are executed.
There is also a move towards Very Long Instruction Word computers which looks like a rejection of the RISC principle – but it isn’t really.
Once you have a reduced instruction set there isn’t much scope for making it faster. One way is to increase the size of each instruction so that once again more gets done per clock cycle. This may look like a return to CISC but again there are only a few, highly optimised, VLIW instructions.
The big split in design technologies, RISC v CISC, is just part of the story. It gets even more interesting when you start looking at what can be done to tweak the basic design of a processor.
If software can be multi-tasking so can hardware.
The early processor designs carried out part of a single instruction at every clock cycle.
First the instruction had to be fetched from memory, then it had to be decoded, then perhaps data had to be fetched, the operation was then carried out and so on. The exact steps vary according to the processor but there are always a number of steps involved in completing an instruction.
A non-pipeline processor has to do everything before moving on to the next instruction.
Modern processors can speed things up by overlapping the execution of commands. In other words, by starting a new instruction before the current one is completed, the number of instructions per clock cycle can be increased. This idea is called “pipelining” and you can think of it as bringing the ideas of a production line to executing instructions.
A pipeline processor on the other hand can be working on more than one instruction at a time – just like a production line.
The x86 Core architecture, for example, has up to 14 stages of pipelining.
Longer pipelines have to be better?!
The big problem with pipelining is what happens if the instruction that you’ve just completed makes the partially completed instructions still in the pipeline invalid.
In this case the pipeline has to be restarted and this costs clock cycles. This can be so expensive that most benchmarks showed that the Pentium 4 with a 20 stage pipeline was slower than a Pentium III with a 10 stage pipleine at the same clock speed.
One way of avoiding the need to restart a pipeline is to always make sure that it is filled with the right instructions.
When a processor reaches a branch instruction it has the choice of one of two possible sets of instructions that it could follow. Branch prediction aims to guess which set is the right set! it sounds like fortune telling but it works because picking the wrong branch isn't any worse and picking the right branch pays off.
Basically is a question of choosing the branch most likely to be taken and this is a matter of statistics.