Author: Eben Hewitt
Aimed at: Database programmers
Pros: Good discussion of wider database theory
Cons: Tends to be repetitive and forward referencing, not practical enough
Reviewed by: Alex Armstrong
This is a really interesting book - but how far will it get you with a real Cassandra project?
Cassandra is one of the new No-SQL databases that are all the fashion. The first part of this book is about the issue of whether or not it really is a fashion. Chapter 1 is called "Introducing Cassandra" but it is really a discussion of database theory and SQL versus other approaches. It explains that you can have any two of availability, consistency or partition tolerance and that different database models fit into the classification in different ways. It's a good read but a little repetitive.
Chapter 2 is about installing Cassandra and this is clear, but given the speed that things are moving probably already out of date. Chapter 3 gets us on to the Cassandra data model and it is well explained but repetitive. It seems to work its way around the subject rather than coming straight out and saying what the data structure is. The most illuminating sections present the Cassandra data structure as its equivalent JSON - this is fine as long as you know what JSON is all about. At the end of this chapter you should have a rough idea of what sort of data you can store in Cassandra but not exactly how to do it.
Chapter 4 is a sample application and it is just a long, many-page listing with not enough discussion. Any reader who turns to this chapter first is likely to be put off the rest of the book.
From this point we move deeper into Cassandra - Architecture, Configuration, reading and writing data, clients, monitoring, maintenance and tuning. The final chapter is on integration with Hadoop. There is also an appendix that is well worth reading as an overview of the No-SQL environment.
Overall this is a good introduction ot the general ideas of Cassandra but not such a good introduction to its details and specifics. Part of the failure is simply due to the software itself being unstable and the book being left behind by the current version. More serious criticisms are that it often makes forward references and that it fails to define in a simple way how things actually work - good on principles but light on practice.
If you read the entire book you still might be wondering what the Cassandra API is all about and how to use it. For example, the author repeatedly says that you can't create a join using Cassandra and that what you have to do is denormalize your data or create the database that stores answers to the sort of query you are going to be presented with. Now this is very true but if you don't have a mind that can translate this abstract statement into something more concrete you are going to be left wondering what it actually means.
Not a good book if you want to know how to use Cassandra but an interesting read in parts if you are interested in the wider issues.