Most computer languages have a regular expression facility that allows you to specify a string of characters you are looking for. There is usually an OR symbol, often |, so A|B matches A or B. Then there is a quantifier usually * which means zero or more. So A* means A, AA, AAA and so on or the null string. There are also symbols which match whole sets of characters. For example, \d specifies any digit, \w specifies any character other than white space or punctuation and \s specifies white space or punctuation. There are usually more symbols so you can match complicated patterns but these three enable you to match a lot of patterns easily. For example, a file name ending in .txt is specified as:

\w*.txt

A file name ending in .txt or .bak as:

\w*.txt|\w*.bak

A name starting with A and ending with Z as:

A\w*Z

and so on.

As already mentioned, regular expressions usually allow many more types of specifiers, far too many to list here. The key point is that a regular expression is another way to specifying a string that a finite state machine can recognize, i.e. it is a regular grammar and this means there are limits on what you can do with it. In particular, you generally can’t use it to parse a real programming language and there is no point in trying.

It is also worth knowing that the "*" operator is often called the Kleene star operator after the logician Stephen Kleene. It is used generally in programming and usually means "zero or more". For example z* means zero or more z characters.

Not in this extract but included in chapter

Other Grammars

Turing Machines

Turing Thinking

Summary

The finite state machine is simple and less powerful than other machines with unbounded storage such as the Turing machine.

Finite state machines accept sequences generated by a regular grammar.

Regular expressions found in most programming languages are a practical implementation of a regular grammar.

The pushdown machine is slightly more powerful than a finite state machine and it accepts sequences generated by a context-free grammar.

A full Turing machine can accept sequences generated by a phrase structured grammar.

The different types of machine and grammars form a hierarchy of increasingly complex languages.

Although a Turing machine is more powerful than a finite state machine, in practice all machines are finite state machines.

A Turing machine with a tape limited to n cells is equivalent to a finite state machine with s^{n}x where s is the number of symbols it uses and x is the number of states in its controller.

Even though Turing machines are an abstraction and do not exist, we tend to create algorithms as if they did, making no assumptions about resource limits. If we do think of resource limits they are afterthoughts bolted on to rescue our inherently unbound algorithms.

Algorithms that are unbounded are the pure thought of computer science.

A Programmers Guide To Theory

Now available as a paperback and ebook from Amazon.

Contents

What Is Computer Science? Part I What Is Computable?

Google now offers a free 2-hour course introducing Google AI Studio and the Gemini API on Udacity. In addition, until early August, enrolling on a Google Professional Certificate on Coursera [ ... ]

Are you sick of the math associated with cryptography? You don't have to be any more. Applied Cryptography from the University of Tartu shows cryptography without the math! At last, a hands-o [ ... ]