Author: Alex Kuznetsov
Publisher: Red Gate
Aimed at : Experienced developers who want to look at database programming from another angle
Pros: Practical guide with unique examples, no boring theory
Cons: SQL Server specific
Reviewed by Nikos Vaggalis
A book that aims to introduce the defensive programming philosophy to database developers. Is it for you?
This book is like treasure discovered by accident. It’s published by Redgate, which I knew as a software company because of .NET Reflector; but I was not aware of its book publishing business.
I stumbled upon it while researching on how MVCC (Multiversion concurrency control) can break existing code infrastructure. Since I develop using the Ingres RDBMS and the upcoming version 10 will include MVCC, I needed to research where the new transaction isolation option could break existing code. Sooner or later I would be migrating from the traditional ANSI isolation level model to this one, and I wanted to prepare myself.
In that respect, I was spot on; the book not only provides solid examples where MVCC can go wrong, but also includes tips, tricks, potential gotchas and practices that the developer should be aware of when changing or upgrading or using code outside the context designed for.
Such scenarios include :
- Changes in database objects, such as tables, constraints, columns, and stored procedures
- Changes to concurrency and isolation levels
- Upgrades to new versions of the DBMS
- Changes in requirements
- Problems causing loss of data integrity
But most importantly it aims to introduce the defensive programming philosophy and get the database developer into the according mindset. In other words it attempts to show how to think proactively and defensively, illustrating the process through numerous hands-on examples
While this is the main aim of the book, it also carries great educational value on various related to the database development subjects, such as how to efficiently organize your logic and code by using the numerous facilities this beast of a software, the RDBMS, provides to your own advantage. Like using stored procedures and triggers for both better data integrity and common logic encapsulation while enforcing business rules using constraints or looking at the phenomenon of lost updates.
While it is solely targeted towards SQL Server and the code examples are in T-SQL, the principles explored are more or less universal and apply to all rdbms.
The book has ten chapters and comes in print and in pdf format versions. Eight chapters are available in the free ebook version, while the final two chapters (and the most crucial in my opinion) are included in the paid version only.
The author breaks the defensive mindset down into the following steps :
- Define and understand your assumptions (ch1)
- Test as many use cases as possible (ch3)
- Lay out your code in short, fully testable, and fully tested modules (ch5)
- Reuse your code whenever feasible (ch5,8)
All these form the framework that the book is based on, while its application is demonstrated through plenty of practical examples
Chapter 1 explains why code that escapes the test environment into the real world might not behave as expected and goes through the process of identifying and reducing a number of common vulnerabilities by targeting three subject areas that the author believes are the commonly problematic ones :
- Unreliable search patterns
- Reliance on specific SQL Server environment settings. The DBMS behavior can be altered by changing settings at user-session level which can lead to unforeseen behavior. This is demonstrated by examining unwanted behaviour when using SET ROWCOUNT, or altering the SET LANGUAGE parameter
- Mistakes and ambiguity during data modifications. When updating more rows than intended and how to avoid ambiguous updates
No concurrency issues or issues arising from changing the underlying database schema are tackled at this point.
Chapter 2 looks into vulnerabilities that occur due to a basic misunderstanding of how the SQL Server engine, or the SQL language, work. For example the WHERE clause conditions will always be evaluated in the same order, or that data will be returned in some "natural order" as many people assume that in the absence of an ORDER BY clause the data will be returned "in the order of the primary key" or "in the order it was inserted." None of these assumptions is correct.
In Chapter 3: Surviving Changes to Database Objects several examples are given on how changing database objects can cause unpredictable behaviour in code that accesses them and discusses how to proactively protect against them. For example, Surviving Changes to the Definition of a Primary or Unique Key or Surviving Changes to the Signature of a Stored Procedure.
Then, Chapter 4 provides all the information I was looking for that prompted my research. In particular it explains how adopting the new MVCC, or the so-called Snapshot Isolation, can break existing code
Upgrading to a new version is always considered beneficial and always favored since it also introduces new features that increase productivity but it can also affect existing infrastructure negatively. Worst of all, most of the time the consequences of upgrading or adapting a new facility are unforeseen and can therefore introduce hard to spot errors and bugs.
Early adopted by Oracle, MVCC is hailed as the silver bullet of concurrency (readers do not block writers, writers do not block readers, provides read consistency without locking, no lock escalation). Although there are numerous examples that promote MVCC, examples that show the opposite where things can go terribly wrong are scarce to non-existent
Fortunately, this chapter clearly demonstrates the good and ugly of MVCC with crisp examples.
Chapters 5 through 7 give a twist to well-known and common practices employed by developers (use triggers to enforce rules or organize your logic in stored procedures) but also integrates them as part of the defensive programming techniques.
So, Chapter 5: Reusing T-SQL Code talks about organizing your common application logic into a single reusable code unit, be it a constraint, stored procedure, trigger, user-defined function (UDF), or index; and discusses the advantages of this approach such as reusability, robustness and fewer bugs.
Chapter 6: Common Problems with Data Integrity and Chapter 7: Advanced Use of Contraints show why data integrity logic in the application layer can be easily bypassed and why business rules enforcement and data integrity checking must be transferred to the data layer through the use of triggers and constraints. On the other hand they go through several common problems with triggers including cases where they simply do not fire and why constraints, although more complex and involved than triggers, are the option to be preferred, i.e. constraints are immune to isolation level changes from ANSI to MVCC.
Chapter 8 asks whether error handling with TSQL should be done on the server side or with a modern programming language like C# on the client side?
The final two chapters, which in my opinion, form the heart of the book are only included in the paid version.
Chapter 9: Surviving Concurrent Queries provides a quick but illustrative 101 introduction to transaction isolation levels and then goes on to show how things can be complicated under concurrent operations. It takes a comparative approach and focuses on how incorrect result sets can be generated, going beyond the contrived scenarios and examples found in most books and resources.
Chapter 10: Surviving Concurrent Modifications completes the concurrency pitfalls, as well as the book, by looking into problems arising from concurrent modifications to data; lost updates, resource contention errors, primary key and unique constraint violations. The pessimistic and optimistic models of protecting against lost updates are explored: rowversion, snapshot isolation levels and pessimistic locking hints; and the rest of the chapter is too SQL server centric tackling T-SQL patterns that fail high concurrency tests.
Summing up, this is a book that will not bore you since it is hands on and filled with practical examples and immediate takeaways. It is oriented towards problem-solving: what could go wrong and how to fix it. The chapters can be read independently - just read what concerns you at the moment and come back later for the rest.
The need to think proactively and defensively is of crucial importance, especially when dealing with the most mission-critical part of a business operation, the dbms.
If it wasn’t already tough enough to master all aspects of one of the largest pieces of software, on the way you should also consider adopting the defensive programming philosophy. While this book, or any book, cannot cover all possible scenarios that can go wrong, the clean cut and unique examples it provides, pave the way for conquering that philosophy. An indispensable guide.