Page 2 of 3
Relation -> Relational Database
Of course just calling a database a table or relation isn’t enough to change the way programmers work.
What Codd did next is to work out a theory of how traditional database operations could be implemented using an algebra of sets rather than programs. The first thing is that becomes apparent is that, as a relation is a set, you can’t have duplicates. If you have a set A that contains an item then adding the item again doesn't make any difference - if it is already in the set then it is all ready in the set, there are no duplicates, This is already different from the way a programmer thinks of a database where you can quite easily have the same information - a user name say stored multiple times..
However the changes had to go both ways. The idea of a pure relation had to be modified to make it more suitable to describe a database. For example, to make relations more like real databases you need to add the idea of key fields, i.e. a column in the table that identifies each record uniquely - that is a given key value belongs to one and only one record.
According to Codd a relational database consists of more than one relation or table and our familiar notion of “relating” tables comes from the idea that one table has a key field that is a field in another table - this is known as a “foreign key”.
For example, Name might be a field in an invoice database and a key field in an address database. Records in each table are loosely speaking “connected” by the foreign key.
As well as modifying definitions it was also necessary to work out what operations are allowed on database tables. Although relations don't have special operations they are just sets and so database operations tend to follow the way set operations work.
As an example of a database operation, consider the union of two tables. A union of two tables is just the table that results if you write out the rows of the second table after the rows of the first. Notice that you can form the union between two tables with different fields. Similarly the intersection between two tables is just the table that consists of the rows that they have in common.
It is clear that these definitions are modelled on the same operation for sets but they are not quite the same.
You can see that you could carry on like this and define operations that combine one table with another to give you a third.
It is this way of thinking that changes the “For first record To last Do xxx” way of processing a database. Now we are thinking the non-procedural A+B=C and this is what the relational model and algebra is all about.
This description would be incomplete without mentioning the operation that really makes real world database manipulations possible - the join.
If you have two database tables A and B to form the join A*B you first create the Cartesian product A X B. The Cartesian product of database tables is just the table which has the columns of A and B and lines that are created by taking every possible combination of a line from A and a line from B. For example, if A is:
and B is:
then the Cartesian product of the two tables A X B is:
x x’ y z
tom tom 29 france
tom codd 29 spain
fred tom 30 france
fred codd 30 spain
codd tom 32 france
codd codd 32 spain
You should be able to see how this works but notice that the field x from table B has been renamed x’ to make it distinct.
The second step in forming the join is to remove all of the lines that do not have identical values in the common fields, i.e. x and x’. So A*B is
x x’ y z
tom tom 29 france
codd codd 32 spain
If you also take out the duplicate x’ column then you get what has come to be called an “equi-join” if you don't take out the duplicates you get a “natural-join” or just a “join”.
By using joins it is possible to break a database down into a number of smaller tables which can be put back together. Exactly how you break a database down is a question of which “normal form” you opt to use and here we start getting into the depths of database design. But put simply normal forms are mostly about removing the redundancies in a database to try to push the representation closer to that of a pure set and a set based algebra.
Needless to say Codd’s approach was seen as very attractive - although not at first by his employer IBM. In 1982 IBM finally caught on and announced SEQUEL and their new database, DB2, both based on Codd’s relational theories. Codd had his own database language called alpha but the IBM team developed their own which wasn't really a relational language but SEQUEL became popular and eventually turned into SQL when the Oracle database was released. SQL contains lots of features that go beyond what Codd considered to be the pure theory of relational databases - but as it is so popular it has become what we all think of as the relational database language. So much so that Microsoft even named its database engine - SQL Server.
The whole subject became so heated and confused that in 1985 Codd published his (in)famous 12 rules which were the principles that a relational database should obey. Interestingly Codd’s rules have become a stick that the “database thought police” use to beat the innocent programmers rather than a guiding light - for this reason you will find them reproduced on the next page.
His book, The Relational Model for Data Base Management" covers the practical aspects of the design of relational databases and defines the twelve rules and the systems that need to be followed in order to be described as truly relational with the motivation behind these rules in over 500 pages.
Codd attempted to remove the “procedural” approach from database and many think that this isn’t possible using a theory based on relations. Even more radical, some go so far as to think that it isn’t desirable and the mathematician’s phobia of procedure, i.e. dynamic processes, shouldn’t be foisted onto the programmer. But notice that this isn't the motivation of the many No-SQL databases that are appearing to be gaining support. This is more about the practical difficulties of building databases that are distributed across servers and which are available to many users at the same time. These are not issues that Codd, Codd's rules or SQL ever considered.
In 1981 Codd was awarded the Turing Award and in 1982 the ACM chose his 1970 paper as one of the 25 most important contributions to the industry. Whatever the true and long term value of the relational model, Codd never gave up the 12-rule approach and defined 12 rules for On-line Analytical Processing (OLAP)! He retired from IBM in 1984 and set up two companies to provide consultancy to the database world.