Microsoft's Catapult Fabric Turns Software Into Hardware
Written by Mike James
Friday, 20 June 2014
Microsoft is trying a radical approach to speeding up its Bing servers by using hardware that can be reconfigured to implement the ranking algorithms. It could be the way forward to better performance in a range of situations.
Until recently we have been able to rely on the increasing power of the average server to keep our code running faster. Most of this increased power has come from increases in clock speed, but clock speed has been stuck at 3GHz for quite some time now. It seems we have to look at more hardware rather than faster hardware in the future, but this means we have to find ways of parallelizing our code.
An interesting alternative is to find ways to convert code into custom hardware. This can be done using Field Programmable Gate Arrays - essentially chips that can be set up to implement a logical function or algorithm by reconfiguring the way modules are connected. Take an FPGA and configure it to implement your algorithm and it will run several times faster than a program on a general purpose computer.
The problem is that one FPGA doesn't do a great deal so fitting out a server farm with one FPGA per server isn't much help. Using multiple FPGAs per server would be too expensive and fairly wasteful of resources.
This is where project Catapult, which is looking at creating a "Reconfigurable Fabric", comes into the picture. A team at Microsoft Research has figured out a way to build servers augmented by FPGAs that can do the work in half the time.
The current system has 48 servers, each with a small FPGA board containing a medium-sized FPGA plus some DRAM. The FPGSs are wired to each other to create a 6x8 grid wrapping round at the edges. This allows groups of FPGAs to be allocated to implement a processing pipeline.
A Catapult 1U half-width server
When a server wants to rank a document it converts it into a form suitable for the FPGAs and sends it to its local FPGA, which then routes it on the internal FPGA network to the start of an eight FPGA ranking pipeline. When finished the result is routed back to the requesting server's FPGA.
The FPGA board - one per server connected in a grid.
The system is fully reconfigurable and designed to be fault tolerant by remapping FPGAs should any fail. The system was tested with 34 banks of 48 server clusters complete with FPGA networks. The experiment proved that you could almost double (95% improvement) the overall speed of the ranking, i.e. the novel architecture is effectively a doubling of the server clock speed. The FPGAs were actually 40 times faster at the ranking task than the CPUs, but there was still some work to be done by the CPU.
You can find out some of the details from the following Microsoft video:
You might ask why not just build some custom hardware?
The answer is that FPGAs are cheaper and, being reconfigurable, can be used to implement tweaks or even completely different algorithms. This is about as close to software implemented in hardware as it gets.
It may only be used for Bing document ranking at the moment, but it is very obvious that there are other applications such as neural networks, computer vision and big data that could benefit from this approach.
A Reconfigurable Fabric for Accelerating Large-Scale Datacenter Services by Andrew Putnam, Adrian M. Caulfield, Eric S. Chung, Derek Chiou, Kypros Constantinides, John Demme, Hadi Esmaeilzadeh, Jeremy Fowers, Gopi Prashanth Gopal, Jan Gray, Michael Haselman, Scott Hauck, Stephen Heil, Amir Hormati, Joo-Young Kim, Sitaram Lanka, James Larus, Eric Peterson, Simon Pope, Aaron Smith, Jason Thong, Phillip Yi Xiao, Doug Burger.