|Big Data Analytics|
Authors: David Loshin
As you can probably tell from that description, this isn’t a book aimed at developers. It doesn’t cover big data application development, MapReduce, Hadoop – nothing specific. Despite that, it’s a useful book if you have to talk to managers or business users and need to talk their language. It’s also a useful read if you haven’t been following the big data story and want to get a grip on the business drives behind it. Each chapter ends with a set of ‘thought exercises’ that would make good questions to ask (or to try to answer) when you’re looking at a specific big data project.
David Loshin starts with a brief look at the things that have led to the rise of big data. Next, he looks at which business problems are suited to big data analytics. This would be a useful chapter for Dilbert to show to his pointy headed boss, especially a table on ‘quantifying organizational readiness’ that can be used to work out a score on how ready your company is in areas of feasibility, reasonability, value, integrability, and sustainability. Each has entries scored from 0 to 4, so on the feasibility row, the company scores 0 if evaluation of new technology is not officially sanctioned, and 4 if evaluation and testing is encouraged, and there’s a clear decision process for adoption or rejection, and time is allocated to innovation.
The next chapter looks at what it means to adopt big data technology, and who needs to be involved. Loshin then moves on to developing a strategy for integrating big data analytics. Much of what he advises is obvious – clarify the criteria for adopting big data technology, prepare the environment for the sheer volume of data, put in proper levels of oversight and governance. However, the fact it’s obvious doesn’t mean it’s not correct, and plenty of big data projects have failed because something obvious has been missed.
Having introduced the topic of governance, the whole of the next chapter looks in more detail at the problems of working with data that has been created outside your control. There are some interesting insights into what constitutes good data, how you can measure data quality, and ways to make data reusable.
The rest of the book has more technical detail than the earlier chapters. A chapter introducing high-performance appliances for big data sets out typical ways big data analytics might be used, and in each case considers the storage, appliance and data management considerations. Loshin then goes on to look at the merits of hardware and software appliances, and gives a short analysis of row- versus column-oriented data layouts. A chapter on big data tools and techniques introduces Hadoop, MapReduce, Yarn, HBase, Hive, Pig and Mahout. The topics are covered fairly briefly and at a high level, but it’s a useful intro. MapReduce then gets a chapter to itself, with a simple example to demonstrate how it works. A chapter on NoSQL discusses key-value, document, tabular and object data stores, and where they fit into the bigger picture. There’s a nice chapter on using graph analytics for big data that gives an understandable and clear introduction to what graph analytics is. It then goes on to discuss when you might use graph analytics, what the different algorithms are (community analysis, path analysis, clustering and so on). There’s a good description of the technical complexity of analyzing graphs, and the chapter closes with a look at what features you should look for in a graph analytics platform. The book closes with a short chapter on best practices.
I found this book an interesting mix. There is a certain amount of management speak along the lines of use cases, organizational alignment, and instituting governance. However, each time I’d begun muttering, Loshin then got back to a practical and worthwhile point. The business advice made sense, and if you do have a pointy-headed boss, this book would be a good one to try to keep him pointing in the right direction. The four star rating I’ve given the book is for someone who needs to understand the technology and how it might be useful. For a programmer, the book is much less directly useful. It would make a good intro, and it would be good for giving you the right phrases to use when talking to your business users, but otherwise it’s too high level.
|Last Updated ( Friday, 19 September 2014 )|