Page 2 of 3
So what have we got so far?
A grammar is a set of rules which defines the types of things that you can legitimately write in a given language.
As such this is very useful because it allows designers of computer languages to define the language unambiguously.
If you look at the definition of almost any modern language – C, C++, C#, Java, Python, Ruby and so on - you will find that it is defined using some form of BNF.
For example, if you lookup the grammar of C++ you will discover than an expression is defined as:
expression , assignment-expression
Which in our BNF notation would be
As already mentioned, sometimes the BNF has to be supplemented by the occasional note restricting what is legal – for example no BNF grammar can restrict the length of a terminal string to a given number of characters.
This use of BNF in defining a language is so obviously useful that most students of computer science and practicing programmers immediately take it to heart as a good method.
Things really only seem to begin to go wrong when the torture of parsing methods is introduced. Most people just get lost in the number and range of complexity of the methods introduced and can’t see why they should bother with any of it.
Let’s find out the true story while avoiding the torture.
You can think of the BNF rules that define a language as a way of generating examples of the language or of testing to see if a statement is a valid example of the language.
Generating examples is very easy – rather like playing dominoes with production rules.
You pick a rule and select one of the parts of the rule separated by “OR” and then try and find a rule that starts with any non-terminal on the right. You replace the non-terminal items until you have a symbol string with no non-terminal items – this is a legal example of the language. It has to be because you generated it using nothing but the rules that define the language.
Working the other way, i.e. from an example of the language to a set of production rules that produce it, turns out to be more difficult. Finding the productions is called “parsing” the expression and which parsing algorithms work well can depend on the nature of the grammar.
That is, particularly simple grammars can be parsed using particularly simple and efficient methods. At this point the typical course in computer science spends a great deal of time explaining the different parsing algorithms available and a great many students simply cannot see the wood for the trees – syntax trees that is!
While the study of parsing methods is important, you only understand why you are doing it when you discover what parsing can do for you.
Obviously if you can construct a legal parse of an expression then the expression is grammatical and this is worth knowing.
For example, any reasonable grammar of arithmetic expression would soon throw out /3+*5 as nonsense because there is no set of productions that create it.
This is valuable but there is more. If you have parsed an expression the way that it is built up shows you what it means.
This is a function of grammar that is usually ignored because grammar is supposed to be only to do with syntax, which is supposed to have nothing to do with meaning or semantics..
In English for example you can have syntax without meaning -
“The green dream sleeps furiously.”
is a perfectly good English sentence from the point of view of grammar but, ignoring poetry for the moment, it doesn’t have any meaning. It is pure syntax without any semantics.
This disassociation of syntax and semantics is often confused with the idea that syntax conveys no meaning at all and this is wrong. For example, even in our nonsense sentence you can identify that something “the dream” has the property “green” and it is doing something, i.e. “sleeping”, and that it is doing it in a certain way, i.e. “furiously”.
This is a lot of information that is brought to you courtesy of the syntax and while the whole thing may not make sense the syntax is still doing its best to help you get at the meaning.