A Mathematical Proof Too Long To Check - The Erdos Discrepancy Conjecture
Written by Mike James
Tuesday, 18 February 2014

We have surely got over the shock of computers being involved in mathematical proofs? It seems not, but in this case the proof occupies a 13GByte file - bigger than the whole of Wikipedia, so perhaps we have crossed a line.

The theorem that has been proved is in connection with a long running conjecture of Paul Erdos. Paul Erdos 1913-1996

Discrepancy theory is about how possible it is to distribute something evenly. It occurs in lots of different forms and even has a connection with cryptography.

Its simplest form is the one that Erdos presented back in 1930. Given a sequence of numbers composed of -1 or 1, then consider any finite sequence of length n spaced d elements apart and form the sum of the elements.

Erdos' discrepancy conjecture says that you can make the sum as large as you like.

For example, given the sequence with  the fairly obvious pattern:

1, -1,  1,1,  -1,-1,  1,1,1  -1,-1,-1,   1,1,1,1,  -1,-1,-1,-1,   and so on

then starting at the third element, a 1, and with a spacing d=2 and length 6, the discrepancy is 1 -1 +1+1-1+1+1 i.e. 3.

Mathematically the conjecture is that given a sequence xn then for any integer C>0 there exists a subsequence xd, x2d, x3d,... xkd for some choice of positive integers k and d Obviously the discrepancy of a subsequence measures the degree to which the elements 1 and -1 are unevenly distributed and the conjecture can be interpreted as saying no matter how hard you try to create a sequence that is evenly distributed you are doomed to failure as there will always be subsequences that have uneven distributions of 1 and -1 so as to give you any discrepancy you care to name.

You can also give the original infinite series a discrepancy score as the maximum discrepancy of a subsequence it contains. In this case the conjecture says that every infinite sequence has an infinite discrepancy.

This simple conjecture has turned out to be very difficult to prove.

In 1993 it was proved that an infinite series cannot have a discrepancy of 1. This was done by showing that a series of length 12 always has a subsequence with discrepancy greater than 1. As this is true of any series of length 12 it has to be true of any infinite series which contain any number of subsequences of length 12.

Thus the theorem is proven for C=1.

You can see that an exhaustive enumeration of all possible length 12 sequences, i.e 212  (=4096), and computing the discrepancy of each one by enumerating every subsequence is a big job, but possible - in fact the job was completed manually!

The problem gained a new burst of life as part of the Polymath project, a collective math Wiki experimenting with the idea of doing "crowd sourced" math. A program was written that looked for sequences with discrepancy 2 and found one of length 1124

The recent progress, which pushes C up to 2 was made possible by a clever idea of using a SAT solver. A SAT solver is a program that when given a logical expression will attempt to find a set of true or false assignments to the expression's variables that make it true, i.e. it finds a set of values that SATisfy the equation.

The SAT problem was one of the first to be proved NP-Complete  but this is another story. The reason that using a SAT prover is a good idea is that over time they have been highly optimized and capable of solving very large expressions.

The technique used is to convert the sequence into a logical expression that is satisfiable is there is a sequence of length L of discrepancy C. So all that had to be done was to create a formula for each L and C=2 and let the SAT solver try to find an assignment that made it true.  Things went well up to length 1160, which was proved to have discrepancy 2, but at length 1161 the SAT returned the result that there was no assignment. So given we already know that the discrepancy has to be greater than 1 and it isn't 2 it has to be at least 3.

Hence we now have that the conjecture is true for C=2.

The next step was to try to find a sequence for which the discrepancy isn't 3 and so push the bound to 3. Unfortunately the search reached length 13,000 with discrepancy 3 and this is where the SAT prover ran out of steam - well time to be exact.

The problem is that the unsatisfiability certificate, the proof that a sequence of length 1161 has no subsequence with discrepancy 2, is 13GBytes and by comparison Wikipedia is currently just short of 10GBytes compressed.  This is clearly too long for a human to check or understand.

As the authors of the paper write:

"[it]...is probably one of longest proofs of a non-trivial mathematical result ever produced. Its gigantic size is comparable, for example, with the size of the whole Wikipedia, so one may have doubts about to which degree this can be accepted as a proof of a mathematical statement."

Does this matter?

Probably not - as long as other programs can check the result and the program itself has to be considered part of the proof.

Where next?

The computation threw up some hints as to how it might be possible to tackle the program on a more human scale. Meanwhile the the program is still running on the case C=3. A SAT Attack on the Erdos Discrepancy Conjecture by Boris Konev & Alexei Lisitsa

#### Related Articles

Six Degrees Of Separation Is New

Finding Solutions To Diophantine Equations By Smell

48th Mersenne Prime Computed

What's a Sample of Size One Worth? 