Big Data and the Internet of Things
Article Index
Big Data and the Internet of Things
Chapters 5 - 8, Conclusion

Author: Robert Stackowiak et al
Publisher: Apress
Pages: 220
ISBN: 978-1484209875
Print: 1484209877
Kindle: B00UH97G38
Audience: Architects, Analysts, Project Managers 
Rating: 4.0
Reviewer: Ian Stirk

This book aims to show you how to implement a Big Data and Internet of Things project, with the subtitle “Enterprise Information Architecture for A New Age”.

This book guides you through the steps required to analyse and extend an existing architecture (based on a data warehouse), to one involving Big Data. The content is project/analysis focused rather than technical.

The book is targeted at “... enterprise architects and information architects, as well as anyone tasked with designing and building these solutions or concerned about the ultimate success of such projects”. Some knowledge of IT systems/architecture is needed, since various terms are used without being defined (e.g. third normal form).

The book is relatively small, containing around 180 working pages, split over 8 chapters.

Below is a chapter-by-chapter exploration of the topics covered.


Chapter 1 Big Data Solutions and the Internet of Things

The book opens with a look at how we arrived at Big Data and the Internet of Things (IoT), giving some history, and explaining that much existing technology is still appropriate in new solutions (e.g. databases). The importance of getting business buy-in is emphasised for successful projects, else the errors of the past may repeat with the newer architectures.

Enterprise Data Warehouse (EDW) and data marts are discussed, outlining their advantages and disadvantages, before noting that increasing volumes of unstructured data required a new approach, involving NoSQL databases and Hadoop. NoSQL databases store data that doesn’t map neatly into the traditional relational database management systems (RDBMS). The 4 main types of NoSQL databases are outlined (i.e. key-value, column, document, graph), before looking at scalability and high availability.


Hadoop is the most popular Big Data platform. The importance of Google’s technology papers is noted in the development of Hadoop’s distributed file system (HDFS), and its distributed parallel programming algorithm (MapReduce). Next, many of Hadoop’s popular tools are highlighted (e.g. Hive, Sqoop). It’s expected that a growing number of sensors and devices will provide steaming data, this is the heart of the Internet of Things. This voluminous data will drive Big Data processing.

The chapter ends with an overview of the methodology for developing and deploying projects (this is what this book is really about). The popular ‘The Open Group Architectural Framework’ (TOGAF) model is briefly examined, noting it forms the basis of the authors’ own methodology – which is then outlined (with each of the 7 stages forming a subsequent book chapter).

This chapter provides a useful overview of the history and drivers of IT systems, culminating in the current Big Data and IoT systems. The iterative methodology for developing and deploying projects is outlined, showing you what to expect from the rest of the book. In essence the book is about defining the current system, defining your required system, and then taking steps to bridge the two.

The chapter is generally easy to read, with helpful diagrams to support the text. It touches a wide-range of topics but only in a cursory manner. Various vendor products are identified in passing. Some knowledge of IT systems/architecture is needed since various terms are used but not defined (e.g. ACID, named node) – perhaps links to further information could have been included. These traits apply to the whole book.

Chapter 2 Evaluating the Art of the Possible

This book is all about providing a strategy to develop new solutions that use Big Data and IoT, with business needs as the driving force.

The chapter first looks at understanding the current system, from both a business and technical perspective, leading to the involvement of the appropriate staff. Various stages of architecture maturity, from silos to Information as a Service, are discussed to help determine the viability of extending the current system. Next, a review of current trends is suggested, and a list of example business projects is provided for various industries. Analysing the future projects, in light of the existing technology should help determine if the current architecture can be extended, or a new architecture is required (e.g. EDW can’t handle streaming data).

In discussing future needs it’s important to get the relevant people involved in the planning sessions, both technical and business staff. Issuing a Vision Session Goals and agenda is important for a meeting, and examples are provided. In the planning session, the current architecture should be discussed with a business focus, gathering views from attendees. Next, the place of Hadoop and NoSQL databases in the extended architecture, if any, should be discussed. An example of how Hadoop and NoSQL databases often extend the existing EDW is provided.

The chapter ends discussing another meeting to examine what has been discussed and what to do next. An example agenda is provided (e.g. current architecture, emerging business needs, future architecture to answer business needs). The importance of getting detailed business cases, rather than letting IT start building the architecture is highlighted. More investigation is needed first.

This chapter provides a helpful stepwise approach to understanding the current business and technology, together with the required future business and technology. Useful checklists are given. 




Chapter 3 Understanding the Business

Experience shows that having business drivers, support, and priorities are critical to project success. The chapter identifies some business value drivers (e.g. increased revenue) and how Big Data can influence them – providing support for business sponsorship. Ways of gathering business needs are given (i.e. plan discovery, preliminary research, and interviews).

Next, a set of business success factors are identified, these are mapped to a project containing the future architecture. An example list of success factors is provided, for the transport industry. Where possible these should be aligned to IT drivers to provide a composite solution. The chapter next discusses prioritising business use cases, by assigning weightings to various business drivers.

The chapter ends with a look at developing an initial business case to support a technical solution, factors discussed in more detail include: 

  • Total Cost of Ownership (TCO) – cost of solution

  • IT Value – IT improvement and cost avoidance

  • Business Value – business benefits

  • Other Trade-offs to Consider (skills available? tools available? cloud usage? etc)  

This chapter provides a forceful reminder of the importance of the business and its needs in driving a successful Big Data project. Some useful lists and steps are provided.

Chapter 4 Business Information Mapping for Big Data and Internet of Things

Having identified some business cases, we next look at the requirements of data sources, reporting, querying and analytics. The chapter first looks at the current system, explaining the use of Data Flow Diagrams (DFDs) to identify data and how it changes during the various processing. Next, an example car business is discussed in relation to its Key Performance Indicators (KPIs), and its current state illustrated with DFDs.

The example car business is then extended, taking into account the new business requirements, to derive a new Business Information Map (BIM), described by DFDs, incorporating the identified KPIs. The chapter ends by assigning the new processing to Hadoop, IoT, NoSQL, and Big Data Analytics.

This chapter shows how to define the current system, and how to extend it with new requirements, to create a future system incorporating the identified KPIs.

The use of DFDs makes the analysis process look relatively simple, maybe too simple... As part of ‘traditional’ systems analysis, DFDs were just one tool, why were the other related diagrams (e.g. entity life history) not used too? Either these additional diagrams where not required or this book is cutting corners (traditional methodologies also discuss moving from physical to logical design where you can de-duplicate, combine data/processing, optimize etc). Perhaps the authors might also explain why the various UML diagrams were not used instead?

Last Updated ( Wednesday, 11 May 2016 )