|Big Data Fundamentals|
Author: Thomas Erl et al
Big Data is an increasingly hot topic, so an entry level book should prove useful to newcomers.
Although not explicitly stated, the book seems to be aimed at decision-makers/managers. Its content is descriptive rather than code-based. Some general IT awareness, experience of traditional IT systems, and of the system life-cycle will help in understanding the book.
The book is relatively small, containing around 200 working pages, spread over eight chapters. The book is split into two parts, the first part looks at the background to Big Data, planning considerations, and some of the technologies involved, the second part looks at concepts involved in storing and analysing Big Data.
Below is a chapter-by-chapter exploration of the topics covered.
Chapter 1 Understanding Big Data
The book opens with a look at some concepts and terminology, specifically: datasets, analysis, analytics, business intelligences (BI), and key performance indicators (KPI) – all are described with reference to business usage. Next, the standard Big Data Vs are briefly discussed, namely: volume, velocity, variety, veracity, and value. The chapter continues with a look at the different types of data, namely: structured, unstructured, semi-structured, and metadata. The chapter ends by introducing a case study (Ensure to Insure), which is continued in discrete subsections in subsequent chapters.
This chapter provides a useful background to Big Data, defining salient concepts, Big Data characteristics, and types of data.
The chapter is well written, easy to read, with lots of helpful diagrams. Discussions are explained in a business context. These traits apply to the whole book.
Chapter 2 Business Motivations and Drivers for Big Data Adoption
This chapter opens with a look at business cycles, with cost-cutting occurring during recessions and new products/services and innovations during times of growth. The importance of using Big Data to extract more useful information for competitive advantage is discussed. Next, the importance business architecture and its alignment to IT are briefly discussed.
The chapter continues with a look at the factors that have increased the uptake of Big Data by business, these include:
This chapter provides a helpful overview of why Big Data processing is needed together with some of its driving forces.
Chapter 3 Big Data Adoption and Planning Considerations
This chapter outlines some concerns to consider when introducing Big Data, these include: privacy, security, auditing, batch and streaming support, performance, and use of the cloud. In each case, each concern is briefly described and put into context.
The chapter continues with a look at the Big Data analytics lifecycle, which differs from traditional analysis due to the volume, velocity and variety of data. The analytics sections briefly discussed are: data identification, gathering and filtering, extraction, validation/cleansing, aggregation/representation, analysis, visualization, and use of results.
The chapter provides a helpful list of factors to consider when planning a Big Data system.
Chapter 4 Enterprise Technologies and Big Data Business Intelligence
This chapter provides an overview of the salient distinguishing features of various types of enterprise system, namely:
The chapter continues with a brief overview of traditional BI (ad-hoc reports, dashboards), before looking at Big Data BI which can analyse multiple business processes at the same time.
Chapter 5 Big Data Storage Concepts
This chapter discusses various aspect of Big Data storage, including:
The chapter continues with a look at the CAP theorem, which basically states that in a partitioned system you can have either availability or consistency. Next, the ACID principles of transaction management are defined (i.e. Atomic, Consistent, Isolated, Durable), before looking at BASE database principle (Basically Available, Soft State, Eventually consistent) – which prefers availability over consistency.
This chapter provides a useful introduction to some of the technology factors involved in Big Data systems.
Chapter 6 Big Data Processing Concepts
This chapter looks at how the large volumes of data are processed, using parallel processing on commodity servers in a cluster. Hadoop, the most popular Big Data platform is introduced VERY briefly.
The chapter continues with a look at MapReduce batch processing. In essence, the data is broken down and processed on numerous servers (the Map phase), the results are combined and aggregated where necessary (the Reduce phase). The chapter next considers realtime in-memory stream processing, here realtime can mean sub-second to under a minute.
This chapter provides a useful overview of Big Data processing. The section on Hadoop is much too brief to be of use.
Chapter 7 Big Data Storage Technology
This chapter opens with a look at disk storage, being relatively cheap and used for long-term storage. This continues with a look at distributed file systems, providing data redundancy and high availability. Next, traditional relational database management systems (RDBMSs) are discussed, these have costly vertical scaling, and are generally unable to cater for the timely processing of large data volumes. NoSQL databases are then examined, these are generally highly scalable. The main types of NoSQL databases are outlined (i.e. key-value, document, column-family, and graph). NewSQL is briefly mentioned, this attempts to marry some NoSQL features with RDBMSs.
The chapter next looks at in-memory storage, while this is more expensive it can offer significantly improved performance. Useful lists of when in-memory storage is appropriate and inappropriate are given. The section ends with a look at the usage of in-memory data grids, and in-memory databases.
Chapter 8 Big Data Analysis Techniques
This chapter provides an introduction to the various common analysis techniques. The core of the chapter looks at
In each case, the technique is explained with adequate detail and useful diagrams. Some very useful questions are proposed to answer.
Appendix A. Case Study Conclusion
The use case is introduced in Chapter 1, and extended with additional detail at the end of each subsequent chapter. This approach is useful since the use case can be examined stand-alone, without interfering with the main body of the book.
This book aims to introduce the basics of Big Data and provides a suitable introduction to Big Data for managers. It is generally easy to read, with a good flow, and has plenty of helpful diagrams. Many sections are brief, which helps maintain focus and interest. Explanations are continuously put into a business context. The book describes Big Data generically, with little reference to specific tools.
As a developer/technologist, I found some sections too wordy with too much business emphasis, although this approach might be suitable for managers. Some sentences in the book felt like consultant-speak, i.e. long words used to say the obvious or little. For a developer-focused introduction to Big Data, I still recommend Big Data Made Easy, see my review here.
|Last Updated ( Saturday, 07 May 2016 )|