The next time your implementation of an algorithm runs a little slow why not use some custom hardware to compute it in parallel? That's what Cooper Bills did in a student project for the game of Life.
The game of Life, that's Conway's Life not the messy biological stuff, is fascinating because of the complexity that arises from just a few simple rules. One of the problem in seeing just how it behaves is that most Life simulations are slow. Now we have a dedicated computer built using an FPGA that can produce a generation at full frame rate and full VGA resolution, i.e. 60Hz at 640x480.
In case you haven't done the arithmetic, that involves over 18 million updates per second. The solution is a massively parallel update using custom hardware implemented using a Field Programmable Gate Array FPGA. The grid was split into columns eight cells wide and the hardware computes each row in turn just before the VGA scan requires the data, i.e. the output is created on the fly. The entire computation completes well before the scan needs the data and in principle it could work faster.
The result is very fast animations of Life in action as you can see in the video: