The idea that computers play a key role in mathematical proof is well accepted today by all but the purist of mathematicians, but the latest example of proof by computer enters a new realm - a proof 200 terabytes in size.

The Pythagorean Triples problem is, or was, an open problem in Ramsey theory:

Can the set of natural numbers, 1,2,3 ... be divided into two parts such that neither part contains a Pythagorean Triple, i.e. three numbers which form the sides of a right angle triangle and where:

a^{2}+b^{2}=c^{2}.

You can also pose the problem as trying to paint the Pythagorean triples with two colors so that no triple is all the same color.

There is also a related problem that asks if the triples can be divided into k sets or painted using n colors. The problem with just two sets is the most basic and the one to solve first.

This is one of those wonderful math problems that is easy to state but very, very difficult to solve. You can start easily enough and when you hit the first triple - the well known 3,4,5 triangle triple you can simply put the 3 into one set and the 4 and 5 in the other. You can carry on like this but 5 is also in another triple 5,12,13 so now you can't put both 12 and 13 in the same set as 5, but 12 and 13 are part of another two triples 12,35,37 and 13,84,85. You can see that as you progress the constraints on how numbers are allocated become more and more difficult.

The question is can it be done?

A prize has been waiting for the answer for twenty years and now we know that the answer is no, but the proof is remarkable in itself.

In principle all you have to do is find a counter example, but as you can see from the small example this become increasingly difficult as you add more and more numbers. Until recently the best result was that the set {1,2,3,,, 7664} can be partitioned - so if you are going to find a counter example you have to look beyond 7664 - 7825 say.

As the paper announcing the result points out you can check that n=7824 has a partition that satisfies the condition in a few seconds - simply exhibit the partition and check that there are no triples in either part. On the other hand, proving that n=7825 has no such partition involves looking at each possible partition.

The Stampede Super Computer at the Texas Advanced Computing Center - 270 Terabytes (TB) RAM, 14 Petabytes (PB) disk storage and 522080 processing cores.

Time for a computer search. Using a SAT solver it is possible to consider exploring all the possibilities, but it was estimated that it would take 35,000 CPU hours. To tackle this task a super computer, the Stampede cluster at the Texas Advanced Computing Center, which has 800 cores, was used to find a solution in approximately two days.

A SAT solver usually just outputs a yes or no answer to "can this formula be satisfied", but recently work has been done to make them provide formal proofs of the result. The final "proof" was 200 terabytes in size which could be compressed to 68 gigabytes.

Clearly this is not a human readable proof.

So now we know that the Pythagorean Triple problem with k=2 is solvable for n<7825 but impossible for larger n.

Is there a shorter human readable, and perhaps understandable, proof?

Notice that there is a huge asymmetry proof that there is a partition that satisfies the condition and proof that there is not. To prove that there is all you have to do is give the partition and let some one check that there is no Pythagorean triple in either set. To prove that there is no such partition you have to consider all partitions and show that they all fail.

If there was a shorter mathematical proof then there would have to be a logical structure that encapsulated the apparent complexity of the enumeration of the non-satisfying partitions. This is the same as there being a simple formula for pi say but no such formula for the majority of irrational numbers - and it is essentially the Kolmogorov measure of information.

It is not necessarily the case that a proof can be made significantly shorter than the complete enumeration of what it proves.

The suggestion is that this sort of approach can be used to solve other combinatoric problems, so perhaps mathematicians need to get used to the idea that "proof" can mean terabytes of data.

Ever wanted to browse Github code like an IDE? Tired of cloning repos just to examine their code structure? If the answer to either of these questions is yes then you will welcome Insight.io.