Big Data and the Internet of Things
Article Index
Big Data and the Internet of Things
Chapters 5 - 8, Conclusion


Author: Robert Stackowiak et al
Publisher: Apress
Pages: 220
ISBN: 978-1484209875
Print: 1484209877
Kindle: B00UH97G38
Audience: Architects, Analysts, PMs, 
Rating: 4.0
Reviewer: Ian Stirk

Chapter 5 Understanding Organizational Skills

When creating a new technical platform it’s important to know what skills are required, what skills are available, and how to solve the gap between the two. The idea behind investigating skills now, before the future technical architecture is decided, is that knowing the currently available skills might be useful in help determine which of the various candidate future architectures to select.

The chapter provides a skills matrix, for each of the TOGAF areas (i.e. business, data, application, and technology), against which the organization/staff have ratings (e.g. 0 = no skills), recorded in a spreadsheet. Staff interviews typically provide input to the skills matrix. Most of the chapter is taken up with explaining the various skills involved. It’s noted these matrices can extended.

The chapter ends with a look at how to address the gap between existing skills and required skills. Often a meeting will be held to discuss the skills matrices results, and how to solve the skills gap, a sample agenda for this meeting is supplied. Some solutions to the skills gap briefly discussed are: develop in-house skills, hire new skilled employees, hire consultants, modify the selected architecture – these have different long term costs and impact time to market.

This chapter provides a useful overview of how to record existing and required skills. It briefly mentions possibly solutions to the skills gaps. I liked the use of the spreadsheet graph to illustrate the skills gap visually.

Chapter 6 Designing the Future State Information Architecture

This chapter revisits the earlier current architecture, expanding it to take into account the latest business use cases. This is then used to expound the proposed future architecture. The chapter opens with a look at the current system in terms of: data sources, data management systems (e.g. EDW), integration tools and interfaces (e.g. ETL). In each case, the information to be gathered and questions to ask are listed (e.g. what is recoverability of EDW). The various BIMs created earlier (describing business functions) are examined in terms of the resultant conceptual architecture, to ensure the architecture is valid (i.e. able to fulfil the business needs).

Having looked at the current architecture, it’s time to look at the underlying physical servers, storage, and networks, since these may affect the proposed architecture. Various data gathering questions are given. Other current system practices should also be recorded (e.g. monitoring and admin functions).

Next, the future system is examined, here using the car business as an example. The differences between the current and future system are compared, being significantly different. Some Hadoop considerations are discussed in relation to data volumes, cluster size, security, and memory for MapReduce processing. The increasing importance of memory for Spark systems (which can be 100 faster than MapReduce) is noted. Similarly, IoT considerations are examined, with sensor information being fed to a NoSQL database for further processing on Hadoop. Some basic operational planning tasks are briefly described (e.g. patching), against the relevant owner.

This chapter provides further detail on the current system, which is then used together with the new business requirements, to expound the architecture of the future system. Some basic Hadoop and IoT considerations are discussed. Experience tells me that getting from the current to the proposed system is a lot more difficult in real world, but what’s given here is a suitable starting framework.

Chapter 7 Defining an Initial Plan and Roadmap

Having defined the proposed future system, with its many associated documents for design, deployment, support, and additionally being aware of the skills requirements, we are now in a position to define a roadmap for the implementation of the future system. This roadmap can then be used to garner sponsors to finance the project – so detail and hard costs must be included.

The chapter acknowledges that changes occur and this causes the earlier plans to be revisited. Such changes include looking in greater detail at the skills required, together with their cost implications. The section provides detail of various roles (e.g. Hadoop administrator, Hadoop developer) together with their skills. These need to be examined in relation to the earlier discussion about how the skills gap will be filled. In a similar manner, the project priorities should be re-examined now we know more detail about the business and technology, and include a phased incremental delivery.

At this stage, more detailed cost estimates need to be included with the various tasks (e.g. cost of cluster). A list of suggested tasks to include is provided. Using the revised costs provides more realistic values to include in the TCO calculation.

The initial roadmap can now be created, the first plan will target IT and cover current and future architecture, business drivers, project phases, skills, costs, risks etc. This plan will be included in a presentation to company executives and sponsors, but these people will be more interested in the business benefits, costs, and timescales. Some example content for both the plan for IT and for business executives are given. Assuming funding is given for the project, the next steps in forming a project team with its various members is outlined.

This chapter revised earlier plans, updated based on feedback from the proposed system. A roadmap, with suggested content outlined, and a presentation to decision makers is discussed.

Chapter 8 Implementing the Plan

This chapter is concerned with implementing a project plan that fulfils the business requirements, all the previous chapters have been preparation for this work. The chapter opens with a look at the implementation steps, the importance of the project launch meeting is discussed (together with its content). Project plans, milestones, and in particular critical path tasks are examined. The use of regular project meeting to discuss progress, problems and solutions is noted, as is keeping the sponsors updated. Any changes need to follow a change control process and be given a cost, priority etc. There’s a brief mention on the causes to project slippage, including: loss of staff, new business requirements, problems scaling up prototypes, and rapidly changing technology – the importance of communication in all these cases is noted.

The chapter ends with a look at operationalizing the solution, including: service level agreements, documentation, and change control process. The importance of claiming project success when a project ends is noted. Similarly, it’s important to conduct a post-mortem analysis of when went well and what could be improved – to ensure the next project progresses even better. Although the project has ended, in reality enhancement are required, which means the process restarts.

This chapter provides useful information on the various project implementation steps together with best practices.


This book aims to show you how to implement a Big Data and Internet of Things project, and succeeds. It contains details of the steps to undertake to analyse and extend an existing architecture (based on a data warehouse) to one based on Hadoop technologies. The emphasis is on project planning and analysis rather than technology.

It is easy to read, with good explanations, and useful diagrams to support the text. The outline agendas and questions included should prove useful in creating your own systems. The book is based on a seven-step methodology, which itself is based on the popular ‘The Open Group Architectural Framework’ (TOGAF).

Many sections can be read without reference to Big Data and IoT, in which case it reads like a good traditional system analysis book, except it’s more agile and less comprehensive - since various steps in traditional methodologies are omitted. Undoubtedly the book oversimplifies the details, but does so to provide an achievable approach.

Some knowledge of IT systems/architecture is needed to get the most out of this book, since various terms are used without being defined (e.g. third normal form, ACID, RDDs, named node). Perhaps links to further information could have been included - I note there is an appendix of references, but these are not annotated.

The book’s title in itself is misleading, this is a book largely about analysis and design. I suspect Big Data and Internet of Things were added because they’re the latest ‘must have’ technologies, and of course they sell...

I enjoyed this book, it took me back to my 1987 SSADM analysis course! The book should certainly help people analyse and design systems (not just Big Data ones).


To keep up with our coverage of books for programmers, follow @bookwatchiprog on Twitter or subscribe to I Programmer's Books RSS feed for each day's new addition to Book Watch and for new reviews.


Effective Computation in Physics

Author: Anthony Scopatz and Kathryn D. Huff
Publisher: O'Reilly
Pages: 792
ISBN: 978-1491901533
Print: 1491901535
Kindle: B010ORQ8DG
Audience: All technical computer users interested in Python
Rating: 4
Reviewer: Alex Armstrong

A book on using computers in physics raises the questio [ ... ]

Grace Hopper and the Invention of the Information Age

Author: Kurt W. Beyer
Publisher: MIT Press, 2009
Pages: 408
Print: 026201310X
Kindle: B00TQ4MY9Q
Aimed at: Anyone interested in history of computing
Rating: 4
Reviewed by: Sue Gee

More Reviews

Last Updated ( Wednesday, 11 May 2016 )