Data Science and Big Data Analytics
Data Science and Big Data Analytics

Author: EMC Education Services
Publisher: Wiley, 2015
Pages: 432
ISBN: 9781118876138
Print: 111887613X
Kindle: B00RXHVQF6
Aimed at: Programmers who need to analyze data
Rating: 4
Reviewed by: Kay Ewbank

The subtitle "Discovering, Analyzing, Visualizing and Presenting Data" indicates that this is another title on a currently hot topic.

This book aims to teach you about big data analytics, though the techniques discussed are well known statistical methods. There are individual chapters on the most commonly used techniques, each chapter covering the key concepts of the technique, the principles behind it, R code using it, and sample exercises illustrating its use.

 

Banner
 

 

The book starts with a chapter introducing big data analytics, looking briefly at what big data is before moving on to look at the ‘state of practice in analytics’. Having introduced the topic, the authors then move on to look at the lifecycle of data analytics – discovery, data preparation, model planning, and model building. The chapter ends with a sample case study showing the different stages in action, and the code and data samples can be downloaded so you can work through the exercises.

The programming language used throughout the book is R, and the next chapter looks at basic data analysis methods using R. The authors introduce the language, discuss exploratory data analysis, then look at hypothesis testing, difference of means, Wilcoxon Rank-Sum, and ANOVA.

 

 

From here onwards the chapters move on to different aspects of advanced analytical theory and methods, starting with a chapter on clustering with a good discussion of K-means. There are chapters on association rules, regression, classification, time series analysis and text analysis.

A chapter on technology and tools looks at MapReduce and Hadoop, though at an introductory level essentially saying what the different parts of the ecosystem are and what roles they play. A chapter on in-database analytics does a similar job with SQL. The book ends with a chapter putting the whole thing together.

I found the book to be quite formal in tone – it is essentially a textbook for the EMC Proven Professional Data Science Certification and is also used as the basis of the EMC MOOC Data Lakes For Big Data MOOC. However, the concepts are explained well, there are good examples, and the authors have picked a good middle route on the amount of technicality on the maths behind the statistical methods – not skimming over it, but not getting too bogged down on it either. 

 

To keep up with our coverage of books for programmers, follow @bookwatchiprog on Twitter or subscribe to I Programmer's Books RSS feed for each day's new addition to Book Watch and for new reviews.

Banner


Java in 24 Hours (7e)

Author: Rogers Cadenhead
Publisher: Sams, 2014
Pages: 448
ISBN: 9780672337024
Print: 0672337029
Kindle: B00K6KG9CM
Audience: Java Beginners
Rating: 4
Reviewer: Mike James

Java in 24 hours? A tempting offer, but Java is has grown into a complex language. This edition of this long standing book cov [ ... ]



Murach's JavaScript, 2nd Ed

Authors: Joel Murach and Michael Urban
Publisher: Murach & Associates
Pages: 630
ISBN: 978-1890774851
Print: 1890774855
Audience: Novice programmers
Rating: 4
Reviewer: Ian Elliot

Another book on core JavaScript - does this one have anything extra to offer?


More Reviews

 

Last Updated ( Friday, 01 June 2018 )
 
 

   
RSS feed of book reviews only
I Programmer Book Reviews
RSS feed of all content
I Programmer Book Reviews
Copyright © 2018 i-programmer.info. All Rights Reserved.
Joomla! is Free Software released under the GNU/GPL License.