Big Data Fundamentals

Author: Thomas Erl et al
Publisher: Prentice Hall
Pages: 240
ISBN: 978-0134291079
Print: 0134291077
Kindle: B019YLYLVY
Audience: Decision-makers new to Big Data
Rating: 3.0
Reviewer: Ian Stirk

Big Data is an increasingly hot topic, so an entry level book should prove useful to newcomers.

 

Although not explicitly stated, the book seems to be aimed at decision-makers/managers. Its content is descriptive rather than code-based. Some general IT awareness, experience of traditional IT systems, and of the system life-cycle will help in understanding the book.

The book is relatively small, containing around 200 working pages, spread over eight chapters. The book is split into two parts, the first part looks at the background to Big Data, planning considerations, and some of the technologies involved, the second part looks at concepts involved in storing and analysing Big Data.

Below is a chapter-by-chapter exploration of the topics covered.

Banner

Chapter 1 Understanding Big Data

The book opens with a look at some concepts and terminology, specifically: datasets, analysis, analytics, business intelligences (BI), and key performance indicators (KPI) – all are described with reference to business usage. Next, the standard Big Data Vs are briefly discussed, namely: volume, velocity, variety, veracity, and value. The chapter continues with a look at the different types of data, namely: structured, unstructured, semi-structured, and metadata. The chapter ends by introducing a case study (Ensure to Insure), which is continued in discrete subsections in subsequent chapters.

This chapter provides a useful background to Big Data, defining salient concepts, Big Data characteristics, and types of data.

The chapter is well written, easy to read, with lots of helpful diagrams. Discussions are explained in a business context. These traits apply to the whole book.

Chapter 2 Business Motivations and Drivers for Big Data Adoption

This chapter opens with a look at business cycles, with cost-cutting occurring during recessions and new products/services and innovations during times of growth. The importance of using Big Data to extract more useful information for competitive advantage is discussed. Next, the importance business architecture and its alignment to IT are briefly discussed.

The chapter continues with a look at the factors that have increased the uptake of Big Data by business, these include:

 

  • Affordable Technology and Commodity Hardware

  • Social Media

  • Hyper-Connected Communities and Devices

  • Cloud Computing

  • Internet of Everything (IoE)

 

This chapter provides a helpful overview of why Big Data processing is needed together with some of its driving forces.

Chapter 3 Big Data Adoption and Planning Considerations

This chapter outlines some concerns to consider when introducing Big Data, these include: privacy, security, auditing, batch and streaming support, performance, and use of the cloud. In each case, each concern is briefly described and put into context.

The chapter continues with a look at the Big Data analytics lifecycle, which differs from traditional analysis due to the volume, velocity and variety of data. The analytics sections briefly discussed are: data identification, gathering and filtering, extraction, validation/cleansing, aggregation/representation, analysis, visualization, and use of results.

The chapter provides a helpful list of factors to consider when planning a Big Data system.

Chapter 4 Enterprise Technologies and Big Data Business Intelligence

This chapter provides an overview of the salient distinguishing features of various types of enterprise system, namely:

 

  • Online Transaction Processing (OLTP) – mainly many small quick queries

  • Online Analytical Processing (OLAP) – mainly a few long running queries

  • Extract Transform Load (ETL) – process of moving and transforming data

  • Data Warehouses/Data Marts – data storage

 

The chapter continues with a brief overview of traditional BI (ad-hoc reports, dashboards), before looking at Big Data BI which can analyse multiple business processes at the same time. 

 

Chapter 5 Big Data Storage Concepts

This chapter discusses various aspect of Big Data storage, including:

 

  • Clusters – grouping of servers, co-ordinated processing

  • File Systems and Distributed File Systems – provides parallel processing and scalability

  • NoSQL – non-relational databases, many niche types

  • Sharding – horizontal partitioning of datasets

  • Replication – provides scalability and fault tolerance

  • Sharding and Replication – provides high availability

 

The chapter continues with a look at the CAP theorem, which basically states that in a partitioned system you can have either availability or consistency. Next, the ACID principles of transaction management are defined (i.e. Atomic, Consistent, Isolated, Durable), before looking at BASE database principle (Basically Available, Soft State, Eventually consistent) – which prefers availability over consistency.

This chapter provides a useful introduction to some of the technology factors involved in Big Data systems.

Chapter 6 Big Data Processing Concepts

This chapter looks at how the large volumes of data are processed, using parallel processing on commodity servers in a cluster. Hadoop, the most popular Big Data platform is introduced VERY briefly.

The chapter continues with a look at MapReduce batch processing. In essence, the data is broken down and processed on numerous servers (the Map phase), the results are combined and aggregated where necessary (the Reduce phase). The chapter next considers realtime in-memory stream processing, here realtime can mean sub-second to under a minute.

This chapter provides a useful overview of Big Data processing. The section on Hadoop is much too brief to be of use.

Chapter 7 Big Data Storage Technology

This chapter opens with a look at disk storage, being relatively cheap and used for long-term storage. This continues with a look at distributed file systems, providing data redundancy and high availability. Next, traditional relational database management systems (RDBMSs) are discussed, these have costly vertical scaling, and are generally unable to cater for the timely processing of large data volumes. NoSQL databases are then examined, these are generally highly scalable. The main types of NoSQL databases are outlined (i.e. key-value, document, column-family, and graph). NewSQL is briefly mentioned, this attempts to marry some NoSQL features with RDBMSs.

The chapter next looks at in-memory storage, while this is more expensive it can offer significantly improved performance. Useful lists of when in-memory storage is appropriate and inappropriate are given. The section ends with a look at the usage of in-memory data grids, and in-memory databases.

Chapter 8 Big Data Analysis Techniques

This chapter provides an introduction to the various common analysis techniques. The core of the chapter looks at

 

  • Statistical Analysis (e.g. A/B testing, correlation, regression)

  • Machine Learning – systems that learn from experience (e.g. classification, clustering)

  • Semantic Analysis – extracting meaning from text/speech (e.g. sentiment analysis)

  • Visual Analysis – graphical data representation (e.g. heat maps)

 

In each case, the technique is explained with adequate detail and useful diagrams. Some very useful questions are proposed to answer.

Appendix A. Case Study Conclusion

The use case is introduced in Chapter 1, and extended with additional detail at the end of each subsequent chapter. This approach is useful since the use case can be examined stand-alone, without interfering with the main body of the book.

Conclusion

This book aims to introduce the basics of Big Data and provides a suitable introduction to Big Data for managers. It is generally easy to read, with a good flow, and has plenty of helpful diagrams. Many sections are brief, which helps maintain focus and interest. Explanations are continuously put into a business context. The book describes Big Data generically, with little reference to specific tools.

As a developer/technologist, I found some sections too wordy with too much business emphasis, although this approach might be suitable for managers. Some sentences in the book felt like consultant-speak, i.e. long words used to say the obvious or little. For a developer-focused introduction to Big Data, I still recommend Big Data Made Easy, see my review here.

Banner


Python Machine Learning, 3rd Ed

Authors: Sebastian Raschka and Vahid Mirjalili
Publisher: Packt
Date: December 2019
Pages: 770
ISBN: 978-1789955750
Print: 1789955750
Kindle: B07VBLX2W7
Audience: Python devs interested in ML
Rating: 5
Reviewer: Mike James
A new edition of a good book on ML is worth a close look.



Raspberry Pi Hacks

Author: Ruth Suehle & Tom Callaway
Publisher: O'Reilly
Pages: 364
ISBN: 9781449362348
Print: 1449362346
Kindle: B00KBAS0CE
Audience: Pi enthusiasts, but not beginners
Rating: 5
Reviewer: Harry Fairhead

A total of 65 Raspberry Pi hacks - surely they have all been done before? Well yes,  [ ... ]


More Reviews

Last Updated ( Saturday, 07 May 2016 )