Big Data Analytics

Authors: David Loshin
Publisher: Morgan Kaufmann
Pages: 120
ISBN: 978-0124173194
Aimed at: Managers who need to understand big data technology
Rating: 4
Reviewed by: Kay Ewbank


This is a short book that aims to describe what big data is, what the problems are, how to work out what is actually needed in big data technologies in a business, and to then develop a strategic plan. Does it deliver?

As you can probably tell from that description, this isn’t a book aimed at developers. It doesn’t cover big data application development, MapReduce, Hadoop – nothing specific. Despite that, it’s a useful book if you have to talk to managers or business users and need to talk their language. It’s also a useful read if you haven’t been following the big data story and want to get a grip on the business drives behind it. Each chapter ends with a set of ‘thought exercises’ that would make good questions to ask (or to try to answer) when you’re looking at a specific big data project.

 

Banner

 

David Loshin starts with a brief look at the things that have led to the rise of big data. Next, he looks at which business problems are suited to big data analytics. This would be a useful chapter for Dilbert to show to his pointy headed boss, especially a table on ‘quantifying organizational readiness’ that can be used to work out a score on how ready your company is in areas of feasibility, reasonability, value, integrability, and sustainability. Each has entries scored from 0 to 4, so on the feasibility row, the company scores 0 if evaluation of new technology is not officially sanctioned, and 4 if evaluation and testing is encouraged, and there’s a clear decision process for adoption or rejection, and time is allocated to innovation.

The next chapter looks at what it means to adopt big data technology, and who needs to be involved. Loshin then moves on to developing a strategy for integrating big data analytics. Much of what he advises is obvious – clarify the criteria for adopting big data technology, prepare the environment for the sheer volume of data, put in proper levels of oversight and governance. However, the fact it’s obvious doesn’t mean it’s not correct, and plenty of big data projects have failed because something obvious has been missed.

Having introduced the topic of governance, the whole of the next chapter looks in more detail at the problems of working with data that has been created outside your control. There are some interesting insights into what constitutes good data, how you can measure data quality, and ways to make data reusable.

 

 

The rest of the book has more technical detail than the earlier chapters. A chapter introducing high-performance appliances for big data sets out typical ways big data analytics might be used, and in each case considers the storage, appliance and data management considerations. Loshin then goes on to look at the merits of hardware and software appliances, and gives a short analysis of row- versus column-oriented data layouts. A chapter on big data tools and techniques introduces Hadoop, MapReduce, Yarn, HBase, Hive, Pig and Mahout. The topics are covered fairly briefly and at a high level, but it’s a useful intro. MapReduce then gets a chapter to itself, with a simple example to demonstrate how it works. A chapter on NoSQL discusses key-value, document, tabular and object data stores, and where they fit into the bigger picture. There’s a nice chapter on using graph analytics for big data that gives an understandable and clear introduction to what graph analytics is. It then goes on to discuss when you might use graph analytics, what the different algorithms are (community analysis, path analysis, clustering and so on). There’s a good description of the technical complexity of analyzing graphs, and the chapter closes with a look at what features you should look for in a graph analytics platform. The book closes with a short chapter on best practices.

I found this book an interesting mix. There is a certain amount of management speak along the lines of use cases, organizational alignment, and instituting governance. However, each time I’d begun muttering, Loshin then got back to a practical and worthwhile point. The business advice made sense, and if you do have a pointy-headed boss, this book would be a good one to try to keep him pointing in the right direction. The four star rating I’ve given the book is for someone who needs to understand the technology and how it might be useful. For a programmer, the book is much less directly useful. It would make a good intro, and it would be good for giving you the right phrases to use when talking to your business users, but otherwise it’s too high level.

 

Banner


R for the Rest of Us

Author: David Keyes
Publisher: No Starch Press
Date: June 2024
Pages: 256
ISBN: 978-1718503328
Print: 1718503326
Kindle: B0CD3GV46N
Audience: Beginners interested in R
Rating: 3
Reviewer: Mike James
Well I'm certainly the "rest of us" - what about you?



Principled Programming

Author: Tim Teitelbaum
Publisher: DateTree Press
Date: March 2023
Pages: 429
ISBN: 978-8987744109
Print: B0BZF8R467
Audience: General
Rating: 5
Reviewer: Mike James
Principled Programming - what else would you want to do?


More Reviews

Last Updated ( Friday, 19 September 2014 )