Big Data For Chimps
Big Data For Chimps

Author: Philip Kromer & Russell Jurney
Publisher: O'Reilly
Pages: 220
ISBN: 978-1491923948
Print: 1491923946
Kindle: B015X0WF36
Audience: Developers wanting to understand Hadoop and Big Data
Rating: 4.5
Reviewer: Kay Ewbank

 

 

Over the festive season IProgrammer asks its reviewers to recommend books that can be considered a good read and worth a second look in case you missed them. We start with one that  Kay Ewbank found both amusing and a good introduction to big data.

The authors of Big Data for Chimps say that the book is definitely not another Definitive Guide to Hadoop; instead it's more like Hadoop: A Highly Opinionated Guide. They don't bother wasting space on basic tutorials or replicating core documentation, concentrating instead on showing how big data analysis works. What they do devote space to is some light-hearted metaphors to illustrate the ideas behind big data analysis.

Banner

The book opens with a chapter on Hadoop basics, which also introduces the notion that a chimpanzee and an elephant start a business together to manage data. This rather bizarre notion is used to give you a physical picture with which to understand how the Hadoop ecosystem can be used to manage data, and it does make it easier to remember which bit of software does what job.

Chapter 2 looks at MapReduce, and/or how Chimpanzee and Elephant save Christmas. It also introduces UFO obsessed raindeer and the MapReduce Haiku. If nothing else, you stay awake and reading to find out what strange fantasy is going to arrive next!  

After a short chapter introducing the data sample (baseball and the rules behind it), the authors move on to introducing Pig. Sadly, while the sample code is still about UFO analysis, there's less frivolity and more code.

 

 

The second half of the book covers tactics for analysis, starting with a chapter on map-only operations, using the baseball data set to show how you can find records that satisfy various conditions, transform records, and work with multiple tables.
 
Grouping operations are covered next, with samples showing how to group and aggregate, put records into bins, and working with subsets.
A chapter on joining tables does a good job of showing how to transfer the idea of a SQL Join into "Hadoop ecosystem" terms, so a join is a cogroup+flatten, and/or a MapReduce job with a secondary sort on the table name.
 

Ordering operations is the topic of the next chapter, looking at various sorting operations, numbering records and shuffling them. Duplicate and unique records and how to deal with them is the last chapter of the book, with a good section on set operations.

Overall, I liked this book. It does have the minor problem that the code to set up the Docker Hadoop image the authors have prepared assumes you'll be using boot2docker, which is now merged with the Docker Toolkit, but there are plenty of online tutorials showing how to set up docker environments so you should be able to work your way around this. The book is also quite short, and I'd have been happy to have had another hundred or two hundred pages to go into more detail.

It is good as far as it goes, though. The code is well written and clear to read; the examples are understandable and lively enough to keep you following what's going on; and by the end of the book you will have a good understanding of the basics of Hadoop and its circle of complementary software.

 

To keep up with our coverage of books for programmers, follow @bookwatchiprog on Twitter or subscribe to I Programmer's Books RSS feed for each day's new addition to Book Watch and for new reviews.

Banner


97 Things Every Programmer Should Know

Author: Kevlin Henney
Publisher: O'Reilly, 2010
Pages: 258
ISBN: 978-0596809485
Print: 0596809484
Kindle: B0039OVIAK
Aimed at: Practising programmers
Rating: 5
Reviewed by: Alex Armstrong

 

The 97 Things series presents a well-chosen collection of short essays in a highly accessible way. Th [ ... ]



Algorithms in a Nutshell, 2nd Ed

Author: George Heineman, Gary Pollice, Stanley Selkow
Publisher: O'Reilly
Pages: 390
ISBN: 978-1491948927
Print: 1491948922
Kindle: B01DAWPK6S
Audience: Programmers wanting to catch up on algorithms
Rating: 5 
Reviewer: Mike James 

Over the festive season IProgrammer asks its revie [ ... ]


More Reviews

Related Reviews

Doing Data Science (Rated 5/5)

Big Data Analytics With Spark (Rated 5/5)

Big Data Made Easy

Field Guide To Hadoop

Hadoop Application Architectures

Hadoop: The Definitive Guide (4th ed)

See also:

Reading Your Way Into Big Data - A Programmer's Bookshelf article recommending the reading required to take you from novice to competent in areas relating to Big Data, Hadoop, and Spark.

 

<ASIN: 1491901632>
<ASIN: B00V7B1IZC>

Last Updated ( Friday, 23 December 2016 )
 
 

   
Banner
Banner
RSS feed of book reviews only
I Programmer Book Reviews
RSS feed of all content
I Programmer Book Reviews
Copyright © 2017 i-programmer.info. All Rights Reserved.
Joomla! is Free Software released under the GNU/GPL License.