Joe Celko has posed another puzzle that requires you to think like a programmer. This one is all about Post tag machines, which have absolutely nothing to do with mail of any type but a lot to do with strings and computability.
“I am about to go Postal,”
said Bugsy Cottman with a smile to Melvin Frammis, head developer at International Storm Door and Software.
“If you need some vacation time, just ask it for it.” replied Melvin.
“I have to help my kid with a Math Fair project. We were going to do a Turing Machine simulation, but then I found out about Post Productions.”
“I guess it would be hard to find an infinitely long magnetic tape.” said Melvin. “Okay, who is this Emil Post?”
“Shame on you! Emil Post is a name that computer science majors should at least have seen while reading about the mathematical foundations of computer science."
Emil Post was a logician who did work with automate, and you usually see mention of him when you read about Turing machines. That is how I found him.
Emil Leon Post 1897-1954
Post production systems are simple things, which are made up of a alphabet of letters from which we make words. The set of rules which change one word to another work by looking at the front the word, removing a certain number of letters, and concatenating more letters to the end of the word.
"This is an ideal job for a computer,” said Bugsy as he picked up a scratch pad.
Here is one simple alphabet and rules:
valid words in the language are made up of only the letters 'a' and 'b'.
if a word begins with 'a', then concatenate 'b' to the end of the result, and delete the first two letters.
if a word begins with 'b', then concatenate 'aba' to the end of the word, and delete the first two letters.
if the word is two or fewer letters in length, then stop doing the productions; otherwise continue perpetually.
Notice that the final rule means that sometimes the process halts and sometimes it doesn't. Whether the program halts or not depends on the starting word.
“Okay, I can already see that words can be broadly categorized as those which halt and those which loop forever"' ,said Melvin.
What Melvin had noticed is that the words which halt belong to one of seven sub-categories: 'a', 'b', 'ab', 'ba', 'aa', 'bb', and the empty word. Where the names of the categories are the final words before the algorithm halts.
The empty word is often written as ε (a lower-case epsilon).
To get you in the swing try the following simple exercise:
Find a word for each of the seven sub-categories i.e a word that halts ending in one of the seven possible ending strings.
Hint: think about what would have to happen just before a word resolved itself to one letter or to the empty word.
The words which loop forever do not fall so neatly into sub-categories. One way obvious way to classify them is by the number of cycles in them.
For example, the word 'baba' gives the following productions:
baaba -> baaba + aba abaaba -> abaaba + b aabab -> aabab + b babb -> babb + aba bbaba -> bbaba + aba abaaba == the second entry = cycle of four!
Melvin looked at the worksheet, did some scratch paper work then announced that the detection of cycles is the hard part.
The obvious way to do this is to build a stack of previous words. The stack can be inspected for a duplicate of the current word, and thus detect a cycle.
With a little thought, you can improve the stack searching by starting at the top and going down, since the cycle will always start at the top of the stack, but might not always go all the way to the bottom.
The advantage of the stack is that it also lets you display the length of the cycle.
Bugsy looked at Mel's diagram, then volunteered,
“I can improve your program's speed by replacing the simple stack mechanism with another data structure which will find a cycle faster. You just have to find the cycle; you do not have to tell me the length of it.”
Melvin got into the spirit, “What if you store your cycle words in a file? This would allow you to load them into the program before you run and make detection of cycles much faster. Is that reasonable?”
“I thought of that, but I was assuming that cycles would be short. I wrote a program to generate a few random words, test them and finally capture the cycle words and their cycle length.”
Melvin smiled and said, “I will tell now that you can always find a cycle greater than any integer (n)”
“Well, darn, if cycles can be of any length whatsoever, then we can never store all of them in a file.” said Bugsy, looking a little sad.
“Do not despair. If you look at the cycle words in your file, you will notice that many of them are built from repetitions of shorter cycle words".
To explain Melvin's comment with an example, the eight-cycle word 'abaabaabaaba' is made up of two occurrences of the four-cycle word 'abaaba' which we derived earlier. We could continue to add as many occurrences as we wish and so produce a cycle of arbitrarily large length.
As mentioned at the beginning, this was a for a Math Fair, so we want questions which have to be answered by proofs, but your program can provide clues and counter-examples. You can use superscripts to save writing and counting, so “aabbb” would be “a²b³” which would give us the power to express more abstract rules.
How about these two questions to start:
Find a finite set of minimal words which generate cycles
If 'x' and 'y' are cycle words, then is 'xy' also a cycle word?
Does a compound cycle word have to have an even number of repetitions?
For the mathematically inclined, these are also called tag productions and there is a good intro at
However it is worth noting that this is an example of a 2-tag system which can be shown to be Turing Complete, i.e. it can compute anything that a Turing machine can and this is generally taken to be anything that can be computed.
If you would like to send in a solutions to this puzzle email firstname.lastname@example.org. Or use the comments below to discuss how to approach it.
Joe Celko is best known as the database expert who writes books on SQL, data and databases. But before that, he was an honest developer obsessed with finding good algorithms and clean code.