Programmer's Guide To Theory - Finite State Machines

Written by Mike James

Monday, 23 August 2021

Article Index
Programmer's Guide To Theory - Finite State Machines
Finite Grammars
Regular Expressions

Page 2 of 3

Finite Grammars

The practical uses of finite state machines is reason enough to be interested in them. Every programmer should know about finite state machines and shouldn't be afraid of implementing them as solutions to problems. However, the second good reason is perhaps more important - but it does depend on your outlook. Finite state machines are important because they allow us to explore the theory of computation. They help us discover what resources are needed to compute particular types of problem. In particular, finite state machines are deeply connected with the idea of grammars and languages that follow rules.

If you define two of the machine’s states as special – a starting and a finishing state – then you can ask what sequence of symbols will move it from the starting to the finishing state. Any sequence that does this is said to be ‘accepted’ by the machine. Equally you can think of the finite state machine as generating the sequence by outputting the symbols as it moves from state to state. That is, a list of state changes obeyed in order, from the start to the finish state, generates a particular string of symbols. Any string that can be generated in this way will also be accepted by the machine.

The point is that the simplicity, or complexity, of a sequence of symbols is somehow connected to the simplicity or complexity of the finite state machine that accepts it.

So we now have a way to study sequences of symbols and ask meaningful questions. As a simple example, consider the finite state machine given below with state 1 as start and state 3 as finish – what sequences does it accept? Assuming that A and B are the only two symbols available, it is clear from the diagram that any sequence like BABAA is accepted by it.

A finite machine accepts a set of sequences

In general, the machine will accept all sequences that can be described by the computational grammar, see Chapter 5.

1)  <S1>   →  B<S3>|A#
2)  <S3>   →  A<S1>|B<S3>

A computational grammar is a set of rules that specify how you can change the symbols that you are working with. The | symbol means select one of the versions of the rule. You can use <null> to specify the starting state and # to specify the final state; in this case we have used <S1> as the starting state. You can have many hours of happy fun trying to prove that this grammar parses the same sequences as the finite state machine accepts.

To see that it is it does, just try generating a sequence:

Start with <S1> and apply rule 1 to get B<S3>. You could have selected A# instead and that would be the end of the sequence.
Use rule 2 to get BA<S1>. You have replaced <S3> by A<S1>
Use rule 1 to get BAB<S3>
Use rule 2 to get BABB<S3>
Use rule 2 to get BABBA<S1>

You can carry on using rule 1 and 2 alternately until you get bored and decide to use the A# alternative of rule 1 giving something like BABBAA#.

Grammar and Machines

Now you can start to see why we have moved on from machines to consider grammar. The structure of the machine corresponds to sequences that obey a given grammar. The question is which grammars correspond to finite state machines? More to the point, can you build a finite state machine that will accept any family of sequences? In other words, is a finite state machine sufficiently powerful to generate or accept sequences generated by any grammar?

The answer is fairly easy to discover by experimenting with a few sequences. It doesn’t take long to realize that a finite state machine cannot recognize a palindromic sequence. That is, if you want to recognize sequences like AAABAAA, where the number of As on either side of the B has to be the same, then you can’t use a finite state machine to do the job.

If you want to think about this example for a while you can learn a lot. For example, it indicates that a finite state machine cannot count without limit. You may also be puzzled by the earlier statement that you can build a finite state machine that can "remember" any past history. Doesn't this statement contradict the fact it cannot recognize a palindromic sequence?

Not really. You can build a finite state machine that accepts AAABAAA and palindromes up to this size, but it won't recognize AAAABAAAA as a similar sequence because it is bigger than the size you designed the machine to work with. Any finite state machine that you build will have a limit on the number of repeats it can recognize and so you can always find a palindromic sequence that it can't recognize.

The point here isn't that you can't build a finite state machine that can recognize palindromes, but that you can't do it so that it recognizes all palindromes of any size. If you think back to Chapter 3, you might notice that this is another argument about bounded and unbounded sequences. If you want to recognize palindromes of any size you need a Turing machine, or equivalent, and an unbounded storage facility.

A finite state machine can count, but only up to fixed maximum.

For some this doesn't quite capture the human idea of a palindrome. In our heads we have an algorithm that doesn't put any limits on the size - it is a Turing algorithm rather than a finite state algorithm. The definition of a palindrome doesn't include a size limit but for a practical machine that accepts palindromes there has to be a limit but we tend to ignore this in our thinking. More on this idea later.

There are lots of other examples of sequences that cannot be recognized by a finite state machine but can be parsed or specified by a suitable grammar. What this means is that finite state machines correspond to a restricted type of grammar. The type of sequence that can be recognized by a finite state machine is called a ‘regular sequence’ - and, yes, this is connected to the regular expressions available in so many programming languages.

A grammar that can parse or generate a regular sequence is called a ‘regular’ or ‘type 3 grammar’ and its rules have the form:

<non-terminal1> →  symbol <non-terminal2 >

<non-terminal1> → symbol

<< Prev - Next >>

Last Updated ( Monday, 23 August 2021 )