SQL and Relational Theory

Author: C.J. Date
Publisher: O'Reilly, 2009
Pages: 428
ISBN: 978-0596523060
Audience: SQL Developers
Rating: 3
Reviewer: Joe Celko

Subtitled “How to Write Accurate SQL Code”, this book is part of a “Theory in Practice” series.

Here one database legend writes about another one. This is more than a book review, it's part history, part theory, and a thoroughly interesting read (Ed)

Chris Date has produced a collection of books on RDBMS and SQL from various publishers over several decades. His “An Introduction to Database Systems” (ISBN: 978-0321197849, Addison-Wesley, 2004) is now in its eighth edition and it was the standard college textbook for years. His claim to fame in the RDBMS world is that he worked with and then partnered with Ted Codd to create a consulting company (Codd & Date) for many years.

Going back to the early days of RDBMS, when there was not so much internet, we had these things made out of paper called magazines. In particular, there were newsstand computer magazines devoted to the “new” exciting topic of databases. They were DBMS from M&T Publishing and DATABASE PROGRAMMING & DESIGN from Miller-Freeman. The publishing industry is volatile and there are buyouts and cancellations. Thanks to something happening to a parent company in Germany, both magazines wound up belonging to Miller-Freeman.

 

978-0596523060

 

So for a few years, Chris Date and I each had columns in a different magazine from the same publisher! Chris would write a piece on topic X and I would respond the next month with an anti-X piece. If you are really old or are a radio nostalgia buff, you will remember the Jack Benny & Fred Allen mock feud. They sniped at each other back and forth on their respective radio shows and boosted the audience for both shows.

We did the same thing; people had to buy both magazines to get the full story. Without that incentive, would you have bought two separate magazines on the same topic otherwise? Perhaps the best part of the series was “Dueling Medians”; each of us would offer a solution for finding the Median in SQL and the other would reply with another approach, other people joined in. I gave the various solutions in my SQL for Smarties.

Chris Date collected his columns and some other material into a series of books for Addison-Wesley in 1986, 1990, 1992 and 1994, then a collection from Apress in 2006 of articles written after the magazine columns were gone.

To this day, I still get asked if I hate Chris Date. Of course not! I buy and read every one of his books. But we do disagree on technical issues. The super short version is that I am the great defender of SQL and data standards; Chris is the defender of a Tutorial D and his school of Relational Theory. I am more hands-on and Chris is more theory.

 

Banner

 

The bad news is that large amounts of the discussion in this book is how SQL does not subscribe to the Date Relational Model and much of the code is in Tutorial D. If you are not familiar with Tutorial D, it a relational programming language that is directly based on the relational calculus. The reader has to learn enough Tutorial D to read the comparisons between SQL and Tutorial D. Date uses his famous Suppliers and Parts database for the examples. He does not spend a lot of time on the DDL and moves to DML. But 80-95% of the work in SQL is done in the DDL, not the DML. And his examples are done with very simple code at the SQL-92 level. Let me be more specific:

The Parts table P (I will get to the DDL for it shortly) gives the weight of a part in pounds and we want it in grams. The Tutorial D version is:

EXTEND P ADD (weight *454.0 AS gm_wt)

The SQL is:

SELECT P.*, (weight *454.0) AS gm_wt FROM P;

Now let's go to the next step. Write a query to give us all parts with a weight greater than 7000.0 gm. The Tutorial D version Date gives is:

((EXTEND P ADD
 (weight *454.0 AS gm_wt))
WHERE gm_wt > 7000.0)
{pno, gm_wt}

The SQL he gives is:

SELECT pno, (weight *454.0) AS gm_wt
 FROM P
 WHERE (weight *454.0 > 7000.0);

His point is that you have to re-use the computation in the WHERE clause. But that is not the case; it can be done with a derived table, or CTE if you want to avoid using a VIEW:

SELECT X.*
  FROM (SELECT pno, (weight *454.0)
        FROM P) AS X (pno, gm_wt)
 WHERE gm_wt > 7000.0;

This is a direct translation of the first query into SQL. The inner SELECT is a derived table that mimics the function of the EXTEND in Tutorial D. Going further, an SQL programmer would probably say to himself, “I am going to need to do this conversion in a lot of places” and he then does the computation in the DDL with a VIEW or with a computed column:

CREATE VIEW Metric_Parts
AS
SELECT pno, pname, color, city, (weight *454.0) AS gm_wt
 FROM P;

Alternatively, the computed column will act like a VIEW; the syntax is just a little different:

CREATE TABLE Metric_Parts
(pno CHAR(2) NOT NULL PRIMARY KEY,
pname VARCHAR(10) NOT NULL,
color VARCHAR(10) NOT NULL,
city VARCHAR(10) NOT NULL,
weight DECIMAL (5,2) NOT NULL,
COMPUTE (weight *454.0) AS gm_wt);

In fairness, Date also has complaints about Tutorial D because it lacks a Relational Division and introduces his DIVIDEBY operator from his previous books. Unfortunately, Relational Division come in many flavors – Codd's original division, Todd's division, with and without remainders and probably others.

Some statements are incorrect. In Chapter 8 on constraints, on page 100, he states that “Transition constraints aren't current supported in either Tutorial D or SQL (other than procedurally).” His discussion uses a transition from “never married” to “married” and you cannot go back to a status of “never married” again.

I published the DDL code for state transition constraint in an article entitled “Constraint Yourself” in Simple Talk. I used (born, married, divorced, dead) as my status values.

In a footnote, Date says:

“The semantics of WITH LOCAL CHECK OPTION are far too baroque to be spelled out in detail here. In any case, it's hard to see why anyone would ever want such semantics; indeed, it's hard to resist the suspicion that this alternative was included in the standard for no other reason than to allow certain flawed implementations, extent at the time, to be abler to claim conformance.”

ANSI tries to get all of the membership to agree on a common abstract model of how SQL works. We then create features on that model. By now, you should know that the clauses of a basic SELECT.. FROM.. WHERE.. GROUP BY.. HAVING.. statement start with the from clause and end with the select clause. We had to work out those rules in the committee because not all products did it that way. At one point, the GROUP BY implementations either put the NULLs in one group or put each NULL in its own group. Sybase did the ALL() and ANY() predicates wrong. Oracle had to add the CHAR2(n) because they got the CHAR(n) wrong; Microsoft added DATETIME(2) to implement the ANSI Standard TIMESTAMP.

Did you notice that *= is long gone from products? The truth is that when the standards change, the vendors change their products, not the other way around. As the standards have progressed, we have fewer and fewer “implementation defined” features.

All that said, yes, the WITH [LOCAL | CASCADE] CHECK OPTION is baroque when you nest VIEWs inside each other. But it can be very powerful and enforce complex relationships that would otherwise have to be done with triggers or worse. It how you can express multi-table joins in SQL when you do not have a CREATE ASSERTION statement.

Chapter 11 is “Using Logic to Formulate Expressions” which gives a quick introduction to two valued predicate logic and quantifiers. The explanations are done with Tutorial D, then translated into SQL. That is confusing. Tutorial D is like classic (i.e. NULL free) predicate logic but it is still another language to learn. But SQL does have NULLs and we need to consider them from the start. This is one reason that minimal Netiquette on SQL Forums requires that you post DDL even for the simplest SQL problems.

For example, Date gives one of his classic tables:

CREATE TABLE Parts
(pno CHAR(2) NOT NULL PRIMARY KEY,
pname VARCHAR(10) NOT NULL,
color VARCHAR(10) NOT NULL,
weight DECIMAL (5,2),
city VARCHAR(10) NOT NULL);

Please notice that the weight is NULL-able, but the first sample data is like this:

INSERT INTO Parts
VALUES
('P1', 'nut', 'red', 12.0, 'London'),
('P2', 'bolt', 'green', 17.0, 'Paris'),
('P3', 'screw', 'blue', 17.0, 'Oslo'),
('P4', 'screw', 'red', 14.0, 'London'),
('P5', 'cam', 'blue', 12.0, 'Paris'),
('P6', 'cog', 'red', 19.0, 'London');

Given the problem “find the parts that have a weight that is different from the weight of any part in Paris”:

SELECT P1.pname
 FROM Parts AS P1
 WHERE NOT EXISTS
      (SELECT *
               FROM Parts AS P2
              WHERE P2.city = 'Paris'
                     AND P2.weight = P1.weight);

produces the same results, but only if there are no NULL weights in the table.

SELECT P1.pname
  FROM Parts AS P1
  WHERE P1.weight
       NOT IN (SELECT P2.weight
                    FROM Parts AS P2
                         WHERE P2.city = 'Paris');

The EXISTS() predicate is always TRUE or FALSE, but the IN() predicate is a shorthand for a chain of OR-ed predicates. The IN() predicate can return UNKNOWN if there are NULLs. We would need get rid of the NULLs:

SELECT P1.pname
  FROM Parts AS P1
  WHERE P1.weight IS NOT NULL
   AND P1.weight
     NOT IN (SELECT P2.weight
                  FROM Parts AS P2
                          WHERE P2.weight IS NOT NULL
                     AND P2.city = 'Paris');

And we now the “IS [NOT] DISTINCT FROM” comparison operator that will treat NULLs as if they are equal:

SELECT P1.pname
  FROM Parts AS P1
  WHERE NOT EXISTS
       (SELECT *
              FROM Parts AS P2
       WHERE P2.city = 'Paris'
       AND P2.weight IS NOT DISTINCT FROM P1.weight);

 

In conclusion, I do not feel the book lived up to its title. Someone trying to improve his SQL or find a systematic approach to constructing a query has to first predicate logic and Tutorial D. Date's dislike of SQL shows up everywhere in the book; he was looking for ways to make SQL look bad. Much of his SQL code is dated and fails to use newer features.

While you expect some repetition, the material was a re-arrangement of his older material without adding anything new. If you have not read Date's other books, then might not be a problem. 

I feel that a better approach would have been to show dangerous or shoddy SQL, demonstrate the problems, explain the math, relational algebra and logic that was ignored and then solve the problems with better SQL. The “Paris-weights” example I showed would be the start of such a detailed analysis. The reader would not be confused by Tutorial D code and would have come away with a better understanding for the limits and power of SQL. 

Banner


Bad Data Handbook

Editor: Q. Ethan McCallum
Publisher: O'Reilly
Pages: 264
ISBN: 978-1449321888
Audience: Data scientists
Rating: 5
Reviewer: Mike James

Data doesn't have to be "big" to cause a problem. This book is a collection of essays on bad data - what makes it bad and how to deal with it. 



Pro Windows Phone 7 Development

Author: Rob Cameron
Publisher: Apress
Pages: 464
ISBN: 978-1430232193
Aimed at: "Pro" WP7 developers
Rating: 3
Pros: Some interesting material covered
Cons: Poor organisation, overlong examples, level varies
Reviewed by: Ian Elliot

 

What do you expect a book called "Pro" to tell you about?


More Reviews

<ASIN:0321197844>

<ASIN:0123820227>

Last Updated ( Monday, 14 January 2013 )
 
 

   
RSS feed of book reviews only
I Programmer Book Reviews
RSS feed of all content
I Programmer Book Reviews
Copyright © 2014 i-programmer.info. All Rights Reserved.
Joomla! is Free Software released under the GNU/GPL License.