Agile Data Science 2.0
Agile Data Science 2.0

Author: Russell Jurney
Publisher: O'Reilly
Pages: 352
ISBN: 978-1491960110
Print: 1491960116
Kindle: B072MKL34K
Audience: Data scientists
Rating: 4.5
Reviewer: Kay Ewbank

This practical book works through the tools and techniques for modern data analysis of structured and unstructured data, showing how to create data analysis applications using Spark and Kafka.

The book opens with a chapter on the theory of agile data science as being centered around web application development. The author sets out a 'manifesto' for agile data science as being organized around iterative development, shipping intermediate output, prototyping experiments, integrating what the data forces on you, taking the data-value pyramid into consideration, and identifying the critical path. The next chapter looks at what agile tools are available and what their benefits are, introducing Spark, MongoDB, Kafka, PySpark Streaming, Spark MLib, and Apache Airflow. 

 

Banner

The Data side is next to be introduced, both the sample data that will be used for the rest of the book and a discussion of SQL versus NoSQL.

Part two of the book is titled Climbing the Pyramid, by which Jurney means the Data-value pyramid. At the bottom of the pyramid come records, the collection, displaying of individual records. The next level up is charts, or more specifically the process of cleaning, aggregating, visualizing and questioning the data. Reports are next highest, then predictions, and at the top level are actions based on what you've learned getting so far.

Having explained the pyramid, Jurney starts at the bottom with how to collect and display records using a combination of MongoDB, Flask, pymongo, Jinja2, and Elasticsearch.

Visualizing data with charts and tables is next on the agenda, along with 'data enrichment' - bringing data in from another dataset to add more information. This is followed by a chapter on exploring data with reports using PySpark, Mongo, Flask, and SparkSQL.

The most useful portion of the book comes next, with three good chapters on predictions, starting with a chapter on making predictions that covers predictive analytics using PySpark and scikit-learn, along with building a classifier with Spark MLib.

Next is a good chapter on deploying predictive systems using scikit-learn, Spark ML, MongoDB, and sending prediction requests to Kafka. The final chapter is on improving predictions and how to tune the Spark ML classifier.

Overall, I liked this book. and although I did have a feeling at times that the author was re-inventing the wheel of the standard database application, the tying together of the different components was well explained and carried out. The material is well organized, and there are plenty of good code samples and a useful appendix. The best thing is that the author stayed very practical throughout the book. If you want to learn how to use the tools described, this is a good introduction.

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on, Twitter, FacebookGoogle+ or Linkedin.

Banner


Concurrency in C# Cookbook

Author:  Stephen Cleary 
Publisher: O'Reilly
Pages:208 
ISBN: 978-1449367565
Print: 1449367569
Kindle: B00KCY2CB4
Audience: Experienced C# programmers
Rating:  4
Reviewer: Mike James

Concurrency is a tough topic and you need all the help you can get, hence thi [ ... ]



A Programmer's Guide to Java SE 8 Oracle Certified Associate (OCA)

Author: Khalid A. Mughal, Rolf W Rasmussen
Publisher: Addison-Wesley 
Pages: 688
ISBN: 978-0132930215
Print: 0132930218
Kindle: B01ITNCBVK
Audience: Candidates for the OCA exam.
Rating: 4.5
Reviewer: Alex Armstrong

Java SE 8 is still important and there is room for more [ ... ]


More Reviews

Related Reviews

Agile Data Science

Big Data Analytics

Data Science and Big Data Analytics

Doing Data Science

Fast Data Processing with Spark (2nd Ed) 

For more book recommendations see Reading Your Way Into Big Data in our Programmer's Bookshelf section.

Last Updated ( Saturday, 25 November 2017 )
 
 

   
Banner
RSS feed of book reviews only
I Programmer Book Reviews
RSS feed of all content
I Programmer Book Reviews
Copyright © 2018 i-programmer.info. All Rights Reserved.
Joomla! is Free Software released under the GNU/GPL License.