Authors: David Hows, Eelco Plugge, Peter Membrey, Tim Hawkins
Publisher: APress, 2013
Aimed at: programmers who want to learn MongoDB
"A Complete Guide to Dealing with Big Data Using MongoDB". Does it live up to this claim?
This 2nd edition has been updated to cover the changes to MongoDB version 2.4 such as hashed indexes.
The authors start Part 1: MongoDB Basics with a look at the philosophy and ideas behind the creation of MongoDB, and how the design decisions affect the way the database works. JSON is also introduced in this chapter. While the authors are obviously fans of MongoDB and the NoSQL model, they do also mention its drawbacks so you don’t get a completely biased view.
Having told you why MongoDB is a good idea, the authors then move on to how to install it on Linux and Windows, along with the PHP driver and the Python driver.
A MongoDB database consists of collections of documents with indexes to improve performance. The way all this works is the subject of the Chapter 3, including geospatial indexing. Working with data – querying, updating, adding and removing documents – is covered in the next chapter.
By Chapter 5 the authors are on to GridFS, and the way it can be used to locate information within documents. GridFS is the specification used by all the MongoDB drivers, and it overcomes the limit of 16MB per MongoDB document. The idea is that if you have large files that you want to store using MongoDB, they’re stored externally and accessed using GridFS.
Part 2 of the book is dedicated to Developing with MongoDB. There are chapters on developing for MongoDB with PHP and Python, and a chapter on ‘advanced queries’ that looks at text search, the aggregation framework, and MapReduce. The chapter on using PHP is good on identifying where PHPs way of working differs from what would be an ideal match with MongoDB, and how to get around this. The aggregation framework was introduced in MongoDB 2.2, and consists of a set of pipeline operators that you can put together to form sequences of operations on all your data. The first operator performs on all the data, later ones in the pipe work on the output from the earlier operators. The chapter looks at $group, $limit, $match, $sort, $unwind, $project, and $skip.
The third and final part of the book is titled Advanced MongoDB with Big Data. There’s a useful chapter on administering MongoDB, then in-depth chapters on optimization, replication, and sharding. The optimization chapter looks at how to evaluate query performance using the profiler, explain(), and the two together to optimize a query. The section on how MongoDB selects which index to use was interesting, as was the section on using hint() to force the use of a specific index.
The chapter on replication shows how to manage the oplog in terms of setting its size to balance out the needs of being able to synchronize replicas, and what the stats actually mean. The sharding chapter starts with a good explanation of why you need to shard, then goes on to analyze different sharding options, including the use of MongoDB’s balancer, the use of hashed shard keys, and tag sharding. This lets you specify which data should be located in a particular shard. I’d have liked longer chapters on both replication and sharding, but what is there is good.
The authors have done a good job on this book. The high level explanations really make sense, and the technical material is clear. I’d still want more on the more advanced topics, but it’s a good read.
MongoDB the Definitive Guide (O'Reilly)
MongoDB Applied Design Patterns (O'Reilly)
MongoDB in Action (Manning)