Yahoo! Gets to the 2 Quadrillionth bit of Pi - it's zero
Written by Mike James   
Sunday, 19 September 2010

The 2,000,000,000,000,000th bit of Pi is zero and we know this because of the application of an amazing formula and a remarkable algorithm. The achievement is as much about distributed and cloud computing as it is about relating circles to circumferences.

Banner

 

This isn't your standard "I computed Pi to more digits than you did" sort of result.  The reason it is different is that it used Hadoop and this means it's a demonstration of low-cost distributed computing.

 

hadoop

The achievement is also a bit strange in that unlike the sort of attack on Pi that lists every digit this one just went straight to the 2,000,000,000,000,000th digit of the pi - and a few digits either side of it and is more than double the previous record. It took 23 days on 1,000 of Yahoo's computers and it is estimated that the calculation would have taken 500 years on a single machine.

To understand the implication of this feat of calculation we need to look at the form of the calculation.

The formula used in the computation is the sum of a series that eventually gives the nth digit of Pi exactly without having to compute the digits that come before it. It was generally thought that this was impossible.

The formula is a finite but large sum that doesn't involve the need to store millions of digits and it can use standard data types. In other words, it reduces the storage problem and makes it more possible to divide the computation up between different machines.

This makes it ideal for the Hadoop MapReduce algorithm. The Hadoop program was first implemented by Google, but it is now a commonly used Open Source approach to implementing distributed computing. All it needs is an array of fairly standard machines linked together by comparatively slow network connections. This make MapReduce suitable for use with cloud-based resources. What MapReduce does is to spit a calculation up and portion it out to the various machines. It keeps track of what is happening, restarts failed computations, and eventually collects the results together in the reduce stage of the operation.

The single digit of Pi calculation is as much a test of the Hadoop methodology as anything else and yes it proves that, for some calculations at least, it works very well.  In principle the whole computation could have been done on computers separated by large distances and owned by lots of different people.

The advent of Hadoop-style computations could well extend what we can reasonably compute - it brings crowd computing to machines.

Further Reading

5 Trillion Digits of Pi - New world record

Bailey–Borwein–Plouffe formula

Hadoop

Hadoop: The Definitive Guide

Pro Hadoop

Banner


Ibis 8 Adds Streaming
05/03/2024

Ibis 8.0 has been released with stream processing backends. The new release includes Apache Flink as a streaming backend, and RisingWave, a streaming database backend. There's also a new batch backend [ ... ]



Apache Lucene Adds Similarity Vector Searches
27/02/2024

Apache Lucene 9.10 has been released with support for similarity-based vector searches. Other improvements include block join compatible index sorting, and several improvements to ensure the software  [ ... ]


More News

<ASIN:1591022002>

<ASIN:0312381859>

<ASIN:0802775624>

<ASIN:0387205713>

<ASIN:3540665722>

Last Updated ( Monday, 14 March 2011 )