Author: Dan Sullivan
Date: April 16, 2015
Audience: Techies learning about NoSQL
Reviewer: Kay Ewbank
Confused about the range of options on offer from NoSQL? Will this book help you?
NoSQL is a topic that covers a variety of technologies, and for which many claims are made. Translating those promises into working systems is quite a lot harder, as is working out just what is really on offer. In this lucid guide Dan Sullivan goes through the major types of database under the NoSQL umbrella, explaining the advantages and drawbacks of each in turn. There’s a reasonable amount of explanatory code snippets used throughout the book, and while the material is accessible to non-programmers, developers will it useful too.
Sullivan begins the book with an analysis of the different types of databases, their advantages and limitations – flat files, relational databases, and NoSQL. The descriptions are clear, and if you don’t already know what the different options do, could be useful. Next comes a chapter on the different varieties of NoSQL databases, which is more generally useful. Sullivan looks at distributed databases, the compromise between ACID (Atomic, Consistent, Isolated, Durable) and BASE (Basically Available, Soft state, Eventually consistent). There’s a nice description of the types of ‘eventually consistent’ choices that have been made, and the chapter closes with a look at four types of NoSQL database – key-value pairs, documents, column family, and graph databases.
The book then moves on to consider each of these four types of database in turn, starting with key-value databases. I quite liked Sullivan’s description of arrays being like key-value stores with training wheels, and he carries the analogy on quite cleverly, with associative arrays being the equivalent of taking off the training wheels, and caches being adding gears to the bike. There are good descriptions of how to construct keys and use them to locate values, and of the essential features of key-value databases. A chapter on key-value database terminology sorts out the definitions, and this section ends with a chapter on designing for key-value databases where aspects such as partitioning, handling structured values, and more detail on the limitations are all well covered.
The next type of database to be discussed is the document database, starting with a detailed intro of what a document database is and how it is used to manage multiple documents. The terminology gets its own chapter, and this section ends with a look at how to design for document databases, taking into account how to manage joins, and planning for mutable documents. There’s an interesting discussion of ‘the Goldilocks zone of indexes’ and how to get just the right size for read-heavy and write-heavy applications.
Column family indexes come next, starting with Google BigTable and including discussions of HBase and Cassandra. As with the other sections, there’s a chapter on the terminology of column family databases, and the section ends with a chapter on designing for column family databases. This latter chapter includes some useful material on the tools for working with big data, some extract, transform and load (ETL) tools, and some analysis options including statistics, machine learning, and general analysis tools.
Graph databases are the last group to get the treatment. I found this probably the most interesting section, with a really good discussion on querying graphs. This looked at Cypher for declarative querying, and Gremlin for query by graph traversal. There’s also useful section on tips for graph database design and traps to avoid.
The final part of the book looks at how to choose a database for your application, and how to use NoSQL and relational databases together. There are some useful bits, but I don’t think there’s enough detail to help you reach a final decision, more put together a shortlist.
I thought this book took an even-handed approach to the different types of NoSQL database, and gave clear explanations of why and when you might use them. If you read it starting from a position of confusion, by the end you should know which type of database you might choose. You’d still need to do more reading and research before putting data into database, but you’d know a lot more than when you started.
See Kay Ewbank's 5-star review of another title in this series: Database Design for Mere Mortals and her reviews of:
Joe Celko’s Complete Guide to NoSQL
Making Sense of NoSQL