Codd and His Twelve Database Rules
Written by Mike James   
Thursday, 09 March 2023
Article Index
Codd and His Twelve Database Rules
From Relation To Relational Database
The Twelve Rules of Codd

Theories of how we should organize databases are thin on the ground. The one exception is the work of E.F. Codd, the originator of the commandment-like “Codd’s Rules”. This approach to database has been codified into SQL - Structured Query Language -  and so into most of the databases on the planet, despite what the NoSQL movement might want you to think. So what are Codd's Rules and what is a relational database?

Database is the strangest of computer applications.

  • It is vital - every computer manages some sort of database,
  • it is difficult - real world database quickly exceeds any reasonable complexity

and

  • it is lucrative - more programmers are occupied with database work than any other.

Strangely, though, in the early days theories of how we should organize databases were very thin on the ground. For such an important subject it was been relatively neglected. It is almost as if the subject was so obvious that it didn't need a theory as such so most people simply got on with it and ignored the need for organizing principles.

The one exception is perhaps the work of E.F. Codd.  Immortalized to a generation or two of database creators, he is revered as the originator of the commandment-like “Codd’s Rules”. 

EFCodd

Edgar Frank Codd (1923-2003) 

Codd was a mathematician and this is an important fact that explains much about his work. The theory of relational databases is one that has its roots in a very abstract mathematical idea.

He was born in the UK and studied math at Oxford before moving to the USA in 1949 to do some work at the University of Tennessee. Before very long he was seduced by the new subject of computing and was working for IBM as a mathematician and programmer.

The first machine that he worked on was the SSEC - one of the early valve machines. He moved on to help with the logic design of the IBM 701, one of the first modern computers, and then the 702. At this point (1953) he took a four-year leave and went to study for his PhD in Canada, at the University of Michigan.

Codd’s work at Michigan was on a relatively new area, Cellular Automata. A cellular automaton is a collection of relatively simple computing units, all of which run the same program and interact with their neighbors on a rectangular grid. If you know Conway's game of Life then you know at least one cellular automata.

This was highly theoretical stuff at the time but it was connected with areas of AI and Artificial Life that are now very important. His thesis was on self reproduction of cellular automata.

Later the work was published as a book, “Cellular Automata”, in 1968. Many theoretical computer scientists know this book but often never suspect that the Codd who wrote it is the same one who did the eminently practical database work!

Database Relationships

However Codd’s main work began in 1969 when he became interested in database software. Being a mathematician he saw database in a very different way to the average programmer. A programmer tends to see the physical way that data is coded and stored. He sees records composed of fields of different types and the whole lot is stored as a file with some type of structure.

The operations that are performed on a database are seen as a general programming problem. To work with a database you write a program that reads the file and works with it in any way that the user/customer considers necessary. If you need to access two records from different parts of the file at that same time it is up to the programmer to get inside and dig around the file in anyway they like.

This is a recipe for complexity to get out of hand! You can well imagine that the database file could be come altered in ways that are incorrect - even to the extent of it not being a valid database file any longer!

What Codd did was to see that the simplest example of a database was nothing more than a table of values. The rows of the table are records and the columns fields and the table itself can be implemented as a file. More recently we have data stored in spreadsheets as a simple table.

This change in viewpoint is, however, more than just a relabeling of the parts. A table, or set of multiple values in mathematical terms, is known as a “relationship”.

For example, suppose I ask you to tabulate the relationship x<y for the integers 1 to 5. What you end up with is a table of pairs - or in general “tuples”:

  x  y
1 2
1 3
1 4
1 5
2 3
2 4
2 5
3 4
and so on…

Any relationship, not just the nice neat regular ones like =, <>, < etc., can be represented by a set of tuples and this is exactly the way that Codd saw a database table and hence the name of the new approach “relational database”. 

Relational Not Related

It is amusing to note that over the years lots of proponents of Codd’s methods have misunderstood the use of the term “relational”, thinking that it comes from the way separate tables are “related” in some way.

If you know some math then the definition

“a relation is a subset of the Cartesian product of a number of sets”

will say it all.

If  you are not a mathematician then this definition is "dense" to say the least. However like many mathematical definitions it is easy to understand if you take it a step at a time.

The Cartesian product, AXB of two sets A and B say is just the set of all possible pairs like (a,b) where a is taken from set A and b is taken from set b. A relation is a simply a subset of this complete set of pairs or tuples.

So for example, if A={1,2,3} and B={1,2,3} then AXB is the set

{(1,1),(1,2),(1,3),(2,1),(2,2),(2,3),(3,1),(3,2),(3,3)}

Any subset of this set is a relation. For example:

{(1,1),(2,2),(3,3)}

is the relation a=b. We may not give names like less than or greater than to all of the subsets but they are no less relations to the mathematician. 

 

<ASIN:0201141922>

<ASIN:1491941170>

<ASIN:1425122906>

<ASIN:1484255399>



Last Updated ( Saturday, 11 March 2023 )