Agile Data Science
Agile Data Science
Written by Kay Ewbank   

Author: Russell Jurney
Publisher: O'Reilly
Pages: 178
Audience: data developers
ISBN: 9781449326265
Print: 1449326269
Kindle: B00H85R904
Rating: 4
Reviewer: Kay Ewbank

 

This short book looks at creating an application using the technologies that fit around Hadoop.   

It was written with the aim of ensuring developers working with Hadoop create successful applications. The author, Russell Jurney,says the book combines three goals: to provide a how-to guide for building analytic applications with Hadoop; to help teams collaborate on big projects in an agile manner; and to give structure to the practice of applying agile big data analytics. The book is written from a Linux and UNIX perspective. There are no examples for Windows users, but there are extensive examples for Linux. Code and supporting materials is available on GitHub.

The book is split into two parts. The first part introduces the data and toolset that is used in the tutorials in the second part, which consists of a set of tutorials to put together an analytical application.

Banner

 

Part I opens with a short chapter introducing the agile big data methodology. A second short chapter then introduces the data set used in the book (a collection of emails), and shows how to make a simple prediction based on probability.

The toolset used by the author consists of Avro, Python, Pig, MongoDB, ElasticSearch and Wonderdog, and these are introduced in the next chapter. This is possibly the most useful chapter in the book; a nice clear introduction to each of the tools in turn, why you need it, what it does, and how it fits with the other tools. The final chapter in Part I looks at how to scale the tools from the previous chapter ‘to petabyte scale’ using the cloud in the form of dotCloud, Amazon Web Services, and Google Analytics. 

 

AgileDatsSci

 

Part II takes the form of tutorials to put together an application. The section starts with a chapter showing how to collect and display records through a web app on ElasticSearch. Next Jurney shows how to create and display charts of the data. There’s a good chapter on extracting entities from the data and linking them to create interactive reports. The next chapter looks at making predictions – in this case predicting response rates to emails. The material here was good, but brief. It showed how to use statistical methods and analytics that you already know, but you couldn’t learn the topic from what’s covered here. The book ends with a chapter showing how to extend your predictions app into a real-time classifier using native Bayes.

Overall, I enjoyed reading this book. The material is well written and very comprehensible. There are extensive code samples, so the actual text is actually quite short. The ‘agile’ part of the title is a bit misleading; the subtitle – building data analytics applications with Hadoop – is much more accurate. However, this isn’t a book that teaches you about Hadoop. What it does do (and does well) is to take the components around Hadoop – Pig, MapReduce and Avro serialization – and show how they fit together and how to use them, alongside MongoDB and ElasticSearch.

In summary, the book has some drawbacks, but it’s a good introduction to putting together an application with Hadoop.  

 

Banner


Social eCommerce

Author: Stephan Spencer, Jimmy Harding & Jennifer Sheahan
Publisher: O'Reilly
Pages: 310
ISBN: 978-1449366360
Print:1449366368
Kindle: B00MC03B0W
Audience: Online marketeers
Rating: 5
Reviewer: Kay Ewbank

This isn't a book on programming; it's about using social media as part of your business strateg [ ... ]



SQL and Relational Theory (3rd Edition)

Author: C J Date
Publisher: O'Reilly
Pages: 582
ISBN: 978-1491941171
Print:1491941170
Kindle: B017S0YPOG
Audience:SQL Developers
Rating: 4
Reviewer:Kay Ewbank

This updated edition of a classic on relational theory has added NoSQL to the mix.


More Reviews

Last Updated ( Monday, 26 January 2015 )
 
 

   
Banner
RSS feed of book reviews only
I Programmer Book Reviews
RSS feed of all content
I Programmer Book Reviews
Copyright © 2017 i-programmer.info. All Rights Reserved.
Joomla! is Free Software released under the GNU/GPL License.