Author: Joe Celko
Publisher: Morgan Kaufmann, 2013
Aimed at: SQL professionals
Reviewed by: Kay Ewbank
Joe Celko is one of the best known writers about databases, and a go-to source for anyone wanting to know about SQL. In this book he’s turned his attention to NoSQL to explain where it fits and what it offers.
This is NoSQL as you’d want it explained, by someone who knows what SQL can do, the type of problems faced in the real database world, and how and where NoSQL is better and worse.
The book is written in Joe Celko’s usual anecdotal, easy to read style, as though having a conversation with a more knowledgeable colleague over a beer, and he covers the material clearly and in context. In fact, this book would more fairly be titled ‘an overview and guide to the mess that is the database market at the moment’, as Celko looks at a wide range of ideas of how to store and retrieve data.
Celko starts off with a look at NoSQL and transaction processing. As he points out, a queue of jobs being read into a mainframe computer is, even now, how the bulk of commercial data processing is done.
Columnar databases are next on the list, with an explanation of how they grew out of the file databases that existed before. Graph databases and the graph theory they’re based on are nicely covered before Celko moves on to the MapReduce model and Hadoop – what to many people is the heart of NoSQL.
Streaming databases, the sort of thing used for commodity trading and stocks and shares, are the topic of the next chapter, along with complex events. The problem a streaming database answers is that of handling massive amounts of data quickly – data flow rather than data retrieval is the key concept. The explanation of complex events using a model of a holdup at a coffee shop is typical of Celko’s style; understandable, and it illustrates the concept. Key-value stores or associative arrays such as Berkeley DB get a chapter, as do geographic databases – GIS systems.
Having covered the ‘basics’ of NoSQL, Celko then moves on to the idea of Big Data as first defined by Forrester Research in a whitepaper. There’s a useful and level-headed consideration of the drawbacks of cloud computing and why it’s still useful that avoids the hype and points out what the real advantages are. Next comes the rather specialist topic of biometrics, fingerprints and specialized databases, how they are being used and how they might theoretically be used.
Analytic databases – essentially BI systems – are explained next, with descriptions of OLAP, MOLAP, etc. Multivalued databases and the way Microsoft SQL Server and Oracle both deal with this challenge also gets a chapter, before the final chapter goes back to hierarchical and network database systems such as IMS and IDMS.
I enjoyed this book, and thought it gave a cool, clear overview of the database market and technologies. There were some minor grammatical errors, that should have been eliminated at the editing stage, but they didn't detract from the points being made. However, the fact it's quite a short book means it has to take a high-level view of the subject while whereas some readers might want it to drill down further into the technicalities.
If you read this book, you’ll understand what the different options offer, why they’re important (or not), and how things fit together. The relatively short nature of the book, and the range of topics, mean you’re not going to come out knowing Hadoop or Sparql in detail, but you will understand a lot more clearly where they sit in the database murk.