Author: Jason Venner
Publisher: Apress, 2009
Aimed at: Newcomers to parallel processing
Pros: Motivating and informative
Cons: Not advanced treatment
Reviewed by: Mike James
Hadoop is an open source system that implements the MapReduce algorithm so that you can run a task in parallel on a cluster of fairly standard machines linked together by a basic network. This is exciting stuff. Even if you can only manage to lash together a few tens of redundant machines you can still do something useful. As a result programmers who have never really contemplated parallel programming before might well be encouraged to give it a try and this book does a lot of encouraging.
Despite its "Pro" title it starts off from the basics and builds up in a logical and steady fashion. The early chapters provide an overview and get you started - how to download and install Hadoop. Then we move on to the basic MapReduce application and variations on the theme. The later chapters deal with debugging, tuning, unit testing and some more advanced MapReduce architectures. The close of the book is a simple real-world application and a look at some related projects - Pig, HBase, Mahout etc. If you are new to MapReduce and Hadoop in particular then this is the place to start.