Author: Pramod J Sadalage & Martin Fowler
Publisher: Addison Wesley
Audience: Developers who want an overview of NoSQL
Reviewed by: Kay Ewbank
There’s been a lot of news and enthusiasm about NoSQL databases recently, and some of the products have reached a maturity where they look tempting. The difficulty is that the technology is still evolving at a pace where it’s difficult to work out what is really core-NoSQL, and what’s either a marketing team’s invention or just a local feature of a particular product. The authors of NoSQL Distilled say the aim of the book is to give you enough information to answer the question of whether NoSQL databases are worth serious consideration for your future projects, and that’s a pretty good summary of the book.
The first half of the book describes the concepts of NoSQL without getting bogged down with the trivialities of a particular product. The authors then go on to look at implementing systems using a variety of NoSQL products. Sadalage and Fowler start off by an overview of why NoSQL has hit the headlines so much, then go on to describe three of the four main data models for NoSQL - key value, documents, and column family databases.
Chapter 3 is a discussion of aggregates and the difficulties that can arise when working with them (aggregates being collections of related data objects treated as units for data manipulation and consistency). So, for example, if you aggregate data on customer orders into units on an order, that works well until you need to look at how many of a particular item has sold during the previous week. Graph databases are also introduced in this chapter.
Chapter 4 looks at distribution models - single server, sharding, master-slave replicas, peer-to-peer replicas, and combining shards and replicas. Chapter 5 tackles consistency. This is one of the biggest challenges when you move a database to a clustered model, and it makes users of traditional databases nervous. When you’ve spent a lot of time ensuring strong consistency, NoSQL concepts such as ‘eventual consistency’ are worrying. Sadalage and Fowler explain the alternative thinking behind working with NoSQL, including the idea of the CAP theorem. CAP stands for Consistency, Availability and Partition tolerance, and in essence you can have any two of the three.
Chapter 6 considers how NoSQL handles the problem of transactions by using version stamps. While NoSQL databases do support atomic updates within an aggregate, more open transactions are less well catered for. The answer, according to the authors, is version stamps. Chapter 7 looks at Map-Reduce and the way it can be used to organize processing to both make use of the machines in a cluster and to avoid unnecessary data transfer between machines.
The final four chapters of the book each take a different example database and database server. Chapter 8 shows how Riak can be used for key-value databases, 9 uses MongoDB with a document database; 10 shows Cassandra being used for column family databases; and 11 uses Neo4J for graph databases.
This isn’t an in-depth guide; if at the end of it you decide you do want to try a NoSQL database, you’ll still need to choose which one to use and read more detailed book to find out more about it. The benefit of this book is that you will have a much clearer idea of why people are getting enthusiastic about NoSQL, and exactly what a NoSQL database might be good for.