Spark In Action, 2nd Edition
Spark In Action, 2nd Edition

Author: Petar Zecevic & Marko Bonaci
Publisher: Manning
Date: January 2017
Pages: 468
ISBN: 978-1617292606
Print: 1617292605
Audience: Java, Scala, or Python programmers
Rating: 4
Reviewer: Kay Ewbank

This book intended to go beyond the basics and enable you to create useful applications with Spark, comes complete with sample code and a case study.

The Spark data processing environment is gaining ever more ground among data scientists wanting to analyze distributed data, and this book is designed to get you to a point where you can do real work using Spark.

Banner

The book starts with an introduction to Spark, after which the Spark fundamentals are introduced. In practical terms, this means the spark-in-action VM, using the Spark shell and writing apps in Spark, the basics of RDD (resilient distributed dataset) actions, transformations, and double RDD functions.

There's a chapter on writing Spark applications in Eclipse that looks at aspects such as loading JSON, aggregating data, and broadcast variables. The Spark API is then looked at in more detail. 

 

sparkinaction

 

Part 2 of the book looks at other elements of the Spark family, with chapters on Spark SQL, ingesting data with Spark Streaming, and two chapters on Spark's machine learning libraries The first of these chapters covers the basics of MLLib, linear algebra, and linear regression. The second covers Spark's updated ML library, logistic regression, decision trees, and K-means clustering. This part of the book ends with a chapter on GraphX and its use in graph processing.

A section on Spark Ops comes next, with chapters on running Spark, running a Spark standalone cluster, and running on YARN and Mesos.

The book ends with a section on bringing it all together. This consists of a case study chapter on creating a real-time dashboard; and a final chapter on deep learning on Spark with H20.

This edition has been updated to cover Spark 2, and it addresses the changes from MLLib to ML, for example.. There's a fair amount of sample code, all in Scala, though Java and Python equivalents are available on Github. One nice touch is a VM with Spark installed and working which you can use to run the examples in the book. There's a PDF and Kindle edition that you can download when you buy the paper edition.

This isn't a book for Spark beginners; it's intended more to get you to the stage of creating real-world applications using Spark. It's not an easy read, but it is thorough, and will take you beyond the beginner or dabbler stage.

 

Related Reviews
Mastering Apache Spark  

Learning Spark

Spark is one of the topics covered in Reading Your Way Into Big Data, an article on Programmer's Bookshelf in which Ian Stirk provides a roadmap of the reading required to take you from novice to competent in areas relating to data science.

Banner
 


eCommerce in the Cloud

Author: Kelly Goetsch
Publisher: O’Reilly
Pages: 282

ISBN: 978-1491946633
Print: 1491946636
Kindle: B00JW0YOII

Aimed at: IT managers and ‘web architects’ 
Rating: 4

Reviewed by: Kay Ewbank

If you want to learn about scalable ecommerce will this book help you?



Murach's SQL Server 2016 for Developers

Authors: Joel Murach and Bryan Syverson
Publisher: Mike Murach & Associates
Pages: 672
ISBN: 978-1890774967
Print: 1890774960

Audience: SQL Server novices
Rating: 5
Reviewer: Ian Stirk

This book is aimed at those new to SQL Server, or those with a basic understanding wanting to l [ ... ]


More Reviews

 

Last Updated ( Thursday, 30 March 2017 )
 
 

   
RSS feed of book reviews only
I Programmer Book Reviews
RSS feed of all content
I Programmer Book Reviews
Copyright © 2017 i-programmer.info. All Rights Reserved.
Joomla! is Free Software released under the GNU/GPL License.