|
Page 1 of 2
Processor perfection
If you want to start a really good argument among a group of hardware enthusiasts just mention the fact that such-and-such a processor is better then some other!
When it comes to processor architecture we don’t even have a clear agreement on what sort of design philosophies should be followed to produce a good one. What we do have is a collection of techniques and approaches that all promise to deliver processor perfection.
In the early days of computer design the big problem was simply that any sort of processor took so much electronics to build that the main concern was keeping things simple. Even when integrated circuits became possible it was still difficult to achieve the sort of component density needed to implement a complete processor on one chip. As a result the early microprocessors were very primitive even by the standards of the day.
The very first processor design philosophy was just the simple idea that more is better. Designers attempted to make a processor do more at each step and tried to make each step take less and less time.
Most processors are “synchronous” – that is they use a clock to time when instructions occur. At its simplest, a synchronous processor carries out one instruction (or at least the same number of instructions) per clock cycle. Hence you can get more processor power simply by increasing the clock frequency.
Initially processors operated at 1MHz (i.e. 1 million pulses per second) but over the years clock rates have shot up to 1GHz (i.e. 1 thousand million pulses per second) or more. What this means is if nothing else had changed the processor we use today would be 1000 times faster than the ones we used back in the 1980s.
It is still true that increasing the clock rate is one of the most effective way of speeding things up – but it’s not the most interesting and recently the effort to increase clock rate has run out of steam. The faster the clock the more electrical power is wasted and the harder it is to make the electronics work reliably.
CISC and RISC
How can you make a processor faster without increasing the clock speed?
The most obvious way is to increase the amount done per clock pulse. This is so obvious that it’s what has happened to processors without anyone really working out that this is the best thing to do! You could say that
performance = clock speed x instructions per clock pulse
Over time processors supported more and more instructions that did more and more.
For example, early processors only supported addition and subtraction instructions and to multiply and divide you had to write a small program that implemented general arithmetic using nothing but addition and subtraction. Of course today’s processors have special numerical hardware built into them that can do high precision arithmetic in a single operation.
How could such an approach be wrong?
Surely it must be better to have a powerful multiply instruction than have to create one from feeble addition instructions? Well to a great extent the obviousness of this argument is based on a misunderstanding of what computers do.
Back in the early 1970s John Cocke at the IBM research labs analysed exactly what instructions a processor used most often. He discovered that of the, say, 200 instructions a processor might support, only 10 or so were used at all often. In fact these 10 instructions accounted for over two thirds of the processor’s time.
Later this became enshrined in the 80/20 rule – 80% of the work is done by 20% of the instructions.
What this implied was that by making these core instructions work as fast as possible the processor would be greatly speeded up. In fact given that only 10 or so simple instructions were really used why bother worrying about the rest!
This gave rise to the idea of a RISC, or Reduced Instruction Set Computer, which implemented a very small set of instructions as efficiently as possible.
Compared to a CISC – Complex Instruction Set Computer – a RISC machine might seem a silly idea. After all the simple instructions did very little but they could be executed at a very fast rate! This is almost the basic principle of a computer, which does very little very very quickly!
ROPs?
If RISC is such a good idea why do we all use CISC x86 processors? For a time the Mac used a RISC processor – the PowerPC – but even here CISC has triumphed with Apple now using Intel processors.
The reason is mainly due to the need to be backward compatible with the early microprocessors and the way that processors evolve by accumulating features. Even RISC processors have a tendency to grow ever more CISC-like with each new version!
However RISC is far from dead. What has happened is that RISC design principles have been incorporated into the core of processors such as the Pentium line without changing their overall nature.
This was a technique first used by Intel’s competitors to produce chips that outperformed the early x86 family processors. NexGen was the first and was so successful that it was bought by AMD, but it didn't take long for Intel to usesthe same ideas to make faster Pentiums.
The way that this works is that the processor core executes a set of reduced instructions called ROPs. The standard complex instructions of the Pentium instruction set are first broken down into a sequence of ROPs and then obeyed.
Believe it or not this strange two-stage execution system is actually faster! The programmer doesn’t have to know anything about the deep architectural changes in the chip because it translates the old instruction set using yet more hardware. This is very similar to the use of a Just In Time (JIT) compiler for a high level language that converts complex instructions into a set of simpler machine code or intermediate code instructions just before they are executed.
There is also a move towards Very Long Instruction Word computers which looks like a rejection of the RISC principle – but it isn’t really. Once you have a reduced instruction set there isn’t much scope for making it faster. One way is to increase the size of each instruction so that once again more gets done per clock cycle. This may look like a return to CISC but again there are only a few, highly optimised, VLIW instructions.
<ASIN:0070570647>
<ASIN:1558605398>
<ASIN:0123705916>
|