Page 1 of 4 Computational grammar is a subject that is sometimes viewed as a form of torture by computer science students, but understanding something about it really does help ....
A Programmers Guide To Theory First Draft
There is a more recent version of this:
Now available as a paperback and ebook from Amazon.
Contents
 What Is Computable?
 Finite State Machines
 What is a Turing Machine?
 Computational Complexity
 Noncomputable numbers
 The Transfinite
 Axiom Of Choice
 Lambda Calculus
 Grammar and Torture
 Reverse Polish Notation  RPN
 Introduction to Boolean Logic
 Confronting The Unprovable  Gödel And All That
 The Programmer's Guide to Fractals
 The Programmer's Guide to Chaos*
 Prime Numbers And Primality Testing
 Cellular Automata  The How and Why
 Information Theory
 Coding Theory
 Kolmogorov Complexity
*To be revised
In The Heart Of A Compiler we saw how assembly languages evolved into high level languages and the problems caused by the apparently simple “arithmetic expression”.
What we haven’t looked at so far is the impact this practical problem had on theoretical computer science.
There is an argument that the arithmetic expression is the whole reason that grammar and parsing methods were invented but once you have the theory of how to do it you might as well use it for the entire structure of a language.
Computational Grammar
To deal with the whole problem of describing and compiling artificial languages we had to invent a new subject – computational grammar.
This is a subject that has over the years been used as a form of torture for thousands of computer science students but it deserves better!
Backus Naur Form  BNF
In the early days of high level languages the only really difficult part was the translation of arithmetic expressions to machine code and much of the theory of grammar was invented to deal with just this problem.
At the core of this theory, and much used in the definition of programming languages, is BNF, or Backus Normal Form. Because Donald Knuth pointed out that it really isn't a normal form in the sense of providing a unique normalized grammar it is often known as BackusNaur Form after John Backus the inventor of Fortran and Peter Naur one of the people involved in creating Algol 60.
Not only isn't it a normal form, there isn't even a single standard notation for BNF and everyone feels free to make up their own variation on the basic theme. However, it is always easy enough to understand.
For example, using "arrow notation" you might write:
<additive expression> > <variable> + <variable>
You can read this as saying that an additive expression is formed by taking a variable, a plus sign and another variable.
This rule only fits expressions like A+B, and not A+B+C, but you get the general idea.
Quantities in angle brackets like
<additive expression>
are called “nonterminal” symbols because they are defined by a BNF rule and don’t appear in the final language.
The plus sign on the other hand is a “terminal” symbol because it isn't further defined by a BNF rule.
You might notice that there is a slight cheat going on in that the nonterminal <variable> was replaced by a terminal in the examples but without a rule allowing this to happen.
Well in proper BNF you have to have a rule that defines every nonterminal in terms of terminal symbols but in the real world this becomes so tedious that we often don’t bother and rely on common sense.
In this case the missing rule is something like:
<variable> > ABCD etc .. Z
where the vertical bar is read as “OR”. Notice that this defines a <variable> as just a single character  you need a slightly more complicated rule for a full multicharacter variable name.
If you want to define a variable that was allowed to have more than a oneletter name you might use:
1 <variable> > <variable> <letter>  <letter> 2 <letter> > ABCD etc .. Z
This is the BNF equivalent of a repeating operation. It says that a variable is either a letter or a variable followed by a letter.
For example, to use this definition you start with:
<variable>
and use rule one to replace it by:
<variable><letter>
then use rule two to replace <letter> by A to give:
<variable>A
Next we use the first rule to replace <variable> to get:
<variable><letter>A
and so on building up the variable name one character at a time.
Boring isn’t it?
But it is an ideal way to tell a computer what the rules of a language are.
Q) Just to check that you are following – why does rule one include the alternative “”?
A) The reason is that this is the version of the rule we use to stop a variable name growing forever because once it is selected the next step it to pick a letter and then we have no nonterminals left in the result.
Notice also that the BNF definition of a variable name cannot restrict its length. Rules like “fewer than 8 characters” have to be imposed as notes that are supplied outside of the BNF grammar.
