|How Fast Does This Code Run?|
|Written by Mike James|
|Tuesday, 14 January 2020|
MIT researchers have trained a neural network to tell you how fast any code that you present it with will run. Sounds fun but why do we need it?
The 4004 chip - you really knew what it was doing!
Back in the day it was possible to look at the output of a compiler, or some hand-crafted assembler, and have a reasonable idea how fast the code would run. You could do the job simply by counting clock cycles for each instruction in the particular version you were using. Then things got more complicated.
In an effort to speed things up, processors went in for reordering instructions, branch prediction, cache misses, speculative execution, execution barriers, multiple cores and hyper threading. Not only does that make it it difficult to work out how many clock cycles any particular chunk of code would take to execute, it isn't even a fixed and unchanging number according to whether branches are taken or not and so on.
So what to do about it?
The only reasonable solution is to try the code out and gather statistically valid data over a large number of runs and hope. This doesn't really make optimizing code particularly easy as the time to cycle through the procedure is quite large. Now a team from MIT has trained a neural network to estimate the time that "basic blocks" of code, i.e. common snippets, take to execute on different architectures. The size of the sample data is impressive - 300,000 blocks taken from a range of different types of application. This is now available as BHive, an open source dataset.
What is surprising is that the resulting program, Ithemal, managed to predict running times on the latest Intel processors more accurately than hand-crafted models created by Intel - and you might suppose that it knows its own processors. Typically Ithemal's error rate is 10% while hand-crafted models have an error rate of 20%.
The program might also have applications in creating optimizations for "black box" processors. i.e. devices for which the exact design isn't known. The neural network doesn't have any idea what the structure of the processor is, instead it uses the code blocks to find out what the processor runs fast and what it runs slow. It is an entirely empirical approach. The approach is also immune from errors in the documentation as it learns from real implementations.
Of course, being a black box in its own right, the neural network gives no clue as to why some code is faster than others. There are no useful insights generated by the tool. The researchers see finding interpretations of the neural network's output as an important next step. Perhaps it will be possible to put the knowledge into some sort of human understandable rules - like always put the most probable choice in the non-branch path.
You can see that this is going to be useful for compiler designers as a way of working out how to optimize the generated code. I'm wondering if it captures any of the structure of the processor and whether this could be used to validate or improve the hardware. AI is certainly changing what software is all about.
Ithemal: Accurate, Portable and Fast Basic Block Throughput Estimation using Deep Neural Networks Charith Mendis, Alex Renda, Saman Amarasinghe and Michael Carbin
Compiler Auto-Vectorization with Imitation Learning Charith Mendis, Cambridge Yang, Yewen Pu, Saman Amarasinghe and Michael Carbin
or email your comment to: firstname.lastname@example.org
s possible with software.
|Last Updated ( Tuesday, 14 January 2020 )|