The Chaos Within Sudoku - A Richter Scale
Written by Mike James
Sunday, 05 August 2012

Sudoku is a fun problem, but how hard is any particular puzzle? Now we have the answer based on measuring the chaos inherent in a grid. This gives a Richter scale for Sudoku.

A pair of computer scientists from the  Babes-Bolyai University (Romania) and the University of Notre Dame (USA) have made some remarkable connections between Sudoku, the classic k-SAT problem, and the even more classic non-linear continuous dynamics.

But before we go into the detail let's look at what this means for Sudoku enthusiasts. Maria Ercsey-Ravasz and Zoltan Toroczkai have devised a scale that provides an accurate determination of a Sudoku puzzle's hardness. So when you encounter a puzzle labelled hard and you find it easy all you need to do is to compute its η, a co-efficient that measures the hardness of the problem.

An easy puzzle should fall in the range 0 < η< 1, medium ones in 1 < η< 2, hard ones in 2 <  η<  3 and for ultra-hard puzzles,  η> 3 with the hardest puzzle, the notorious Platinum Blond being top of the scale with η = 3.6. We will have to wait to see if newspapers and websites start to use this measure of difficulty.

To return to the theory behind this breakthrough, as k-SAT is an NP-Complete problem, any NP problem can be converted into a specific k-SAT problem -  this is what NP complete means.

Just in case you don't know, the k-SAT problem it is very easy to describe. You have a logical expression in n Boolean variables and each clause involves only k of them at a time. All you have to do is find a set of assignments to the variables that makes the expression true. For example, a 2-SAT problem on 3 variables x,y,z is something like:

(x OR y) AND (z OR x)

and you have to find values of x,y and z to make the expression true. This k-SAT is easy but if k>2 and as N gets bigger the problem gets much harder.

The k-SAT (k>2) problem is NP. That is, to find a solution takes longer than polynomial time, but to check a solution takes polynomial time. In other words solutions are hard to find but easy to check.

So the conversion of Sudoku to k-SAT isn't a great surprise, but it is a difficult task. For example, an easy puzzle with 34 clues has 126 variables and 717 clauses in the Boolean expression. The conversion from a given Sudoku to its k-SAT form is automatic but complex.

The next step is perhaps the most remarkable.

You can set up an "energy" function that indicates how far away from a solution a given configuration is. The energy function is zero when the configuration is a solution. The subtle part is that the energy function is a continuous function of solution variables that range between -1 and 1. This converts the problem from a discrete search to a continuous one.

In classical dynamics, the motion of a body is governed by an energy function and the system evolves to find the lowest energy - basically things roll downhill. Any k-SAT problem can be solved using the same "gradient descent" approach on its energy function. Put simply, you start the system off from a random configuration and move it in the direction of decreasing energy. When the system has reached an energy of zero you have the solution.

This approach makes a bridge between discrete search algorithms and classical continuous dynamics. You can see how difficult the search problem is by how fast the system moves down the energy surface to a minimum. Simple Sudoku problems slide all the way down to a solution without much trouble. More complex problems mean that the energy surface is also more complex and the route taken to minimize the energy is convoluted and takes longer.

You can see that this is the case by comparing how long it takes the solution variables to settle down in two cases - an easy and a hard Sudoku. Click to enlarge

The behavior of the the dynamics of the system can be used to define the difficulty of the problem. The longer the system takes to settle down and the more chaotic the behavior is on the way to a solution, then the harder the problem. This makes it possible to define a Sudoku Richter-like scale, η, measuring the hardness of the problem, with easy puzzles falling in the range 0 < η< 1, medium ones in 1 < η< 2, hard ones in 2 <  η<  3 and for ultra-hard puzzles  η> 3.

The ratings seem to fit with the example puzzles analyzed from the Sudoku of the Day website with easy at 0.8, medium 1.4, hard 1.78 and "absurd" at 1.8.

To find puzzles at the top end of the scale you have to look for examples that have been proclaimed "the hardest". For example, the Guardian 2010 hardest puzzle only scored 2,8 on the Sudoku Richter scale. Other "hardest" puzzles also turn out to be not so hard - USA Today's 2006 hardest puzzle scored only 2.17.

So what are the hardest of all puzzles?

The five most hard puzzles are so notorious they have names  - Platinum Blond, Golden Nugget, Red Dwarf, coly013 and tarx0134. They have hardness coefficients in the range 3 to 3.6 with Platinum Blond being the hardest at 3.6.

Where do we go next?

Being able to measure the complexity of a Sudoku puzzle may seem like a small thing but the connection between discrete search and continuous non-linear dynamics is intriguing - it deserves more attention. More cartoon fun at xkcd a webcomic of romance,sarcasm, math, and language

The Chaos Within Sudoku (pdf)

#### Related Articles

No 16-Clue Sudoku!

NP-Complete - Why So Hard?

Computational Complexity 