|Codd and his Rules|
|Written by Mike James|
|Thursday, 05 October 2017|
Page 2 of 2
This description would be incomplete without mentioning the operation that really makes real world database manipulations possible - the join.
If you have two database tables A and B to form the join A*B you first create the Cartesian product A X B. The Cartesian product of database tables is just the table which has the columns of A and B and lines that are created by taking every possible combination of a line from A and a line from B. For example, if A is:
and B is:
then the Cartesian product of the two tables A X B is:
You should be able to see how this works but notice that the field x from table B has been renamed x’ to make it distinct.
The second step in forming the join is to remove all of the lines that do not have identical values in the common fields, i.e. x and x’. So A*B is
If you also take out the duplicate x’ column then you get what has come to be called an “equi-join” if you don't take out the duplicates you get a “natural-join” or just a “join”.
By using joins it is possible to break a database down into a number of smaller tables which can be put back together. Exactly how you break a database down is a question of which “normal form” you opt to use and here we start getting into the depths of database design. But put simply normal forms are mostly about removing the redundancies in a database to try to push the representation closer to that of a pure set and a set based algebra.
Needless to say Codd’s approach was seen as very attractive - although not at first by his employer IBM. In 1982 IBM finally caught on and announced SEQUEL and their new database, DB2, both based on Codd’s relational theories. Codd had his own database language called alpha but the IBM team developed their own which wasn't really a relational language but SEQUEL became popular and eventually turned into SQL when the Oracle database was released. SQL contains lots of features that go beyond what Codd considered to be the pure theory of relational databases - but as it is so popular it has become what we all think of as the relational database language. So much so that Microsoft even named its database engine - SQL Server.
The whole subject became so heated and confused that in 1985 Codd published his (in)famous 12 rules which were the principles that a relational database should obey. Interestingly Codd’s rules have become a stick that the “database thought police” use to beat the innocent programmers rather than a guiding light - for this reason you will find them reproduced on the next page.
His book, The Relational Model for Data Base Management" covers the practical aspects of the design of relational databases and defines the twelve rules and the systems that need to be followed in order to be described as truly relational with the motivation behind these rules in over 500 pages.
Codd attempted to remove the “procedural” approach from database and many think that this isn’t possible using a theory based on relations. Even more radical, some go so far as to think that it isn’t desirable and the mathematician’s phobia of procedure, i.e. dynamic processes, shouldn’t be foisted onto the programmer. But notice that this isn't the motivation of the many No-SQL databases that are appearing to be gaining support. This is more about the practical difficulties of building databases that are distributed across servers and which are available to many users at the same time. These are not issues that Codd, Codd's rules or SQL ever considered.
In 1981 Codd was awarded the Turing Award and in 1982 the ACM chose his 1970 paper as one of the 25 most important contributions to the industry. Whatever the true and long term value of the relational model, Codd never gave up the 12-rule approach and defined 12 rules for On-line Analytical Processing (OLAP)! He retired from IBM in 1984 and set up two companies to provide consultancy to the database world.
The 12 rules of Codd
Of the 12 rules only the first 6 have to be satisfied for a database to be called “relational” but there is a rule 0 which has to be obeyed - perhaps they should be called Codd’s 13 rules?
Rule 0: Relational Database Management
For any system that is advertised as, or claimed to be, a relational database management system that system must be able to manage database entirely through its relational capabilities.
Rule 1: Representation of information
All information in a relational database is represented explicitly at the logical level and in exactly one way - by values in tables.
Rule 2: Guaranteed logical accessibility
Each and every datum in a relational database is guaranteed to be logically accessible by resorting to a combination of table name, primary key value and column name.
Rule 3: Systematic representation of missing information
Null values (distinct from the empty character string or a string of blank characters and distinct from zero or any other number) are supported in fully relational database management systems for representing missing information and inapplicable information in a systematic way, independent of the data type.
Rule 4: Dynamic online catalog
The database description is represented at the local level in the same way as ordinary data, so that authorized users can apply the same relational language to its interrogation as they apply to the regular data.
Rule 5: Comprehensive data sub-language
A relation system may support several languages and various modes of terminal use (for example, the “fill in the blanks mode”). There must be, however, at least one language whose statements are expressible, per some well defined syntax, as character strings, and that is comprehensive in supporting all of the following items:
Rule 6: Updatable views
All views that are theoretically updatable are also updatable by the system.
Rule 7: High level insert, update and delete
The capability of handling a base relation or a derived relation as a single operand applies not only to the retrieval of data, but also to the insertion, update and the deletion of data.
Rule 8: Physical data independence
Application programs and terminal activities remain logically unimpaired whenever any changes are made in either storage representations or access methods.
Rule 9: Logical data independence
Application program and terminal activities remain logically unimpaired when information-preserving changes of any kind that theoretically permit unimpairment are made to the base tables.
Rule 10: Integrity independence
Integrity constraints specific to a particular database must be definable in the relational data sub language and storage in the catalog, not in the applications program.
Rule 11: Distributed independence
Whether or not a system supports database distribution, it must have a data sublanguage that can support distributed database without impairing application programs or terminal activities.
Rule 12: The nonsubversion rule
If a relational system has a low-level (single-record-at-a-time) language, that low-level cannot be used to subvert or bypass the integrity rules and constraints expressed in the higher level relational language (multiple-records-at-a-time)
or email your comment to: firstname.lastname@example.org
|Last Updated ( Thursday, 05 October 2017 )|