|Reading Your Way Into Big Data|
|Written by Ian Stirk|
|Monday, 14 December 2015|
Page 2 of 4
Scala is very popular with Big Data systems, being used increasingly with interactive processing (e.g. Spark). The book is relatively small, having around 220 working pages, consisting of two sections: Core Scala (7 chapters) and Object-Oriented Scala (3 chapters). It aims to help developers learn the Scala programming language, and succeeds admirably - provided you are already familiar with programming concepts using another language, especially an object-oriented language. I found myself constantly aligning my existing programming knowledge with Scala's syntax.
The book is well written, concise in its explanations, with plenty of helpful examples to follow along with. The summaries and exercises at the end of each chapter are useful. The answers to the chapter exercises can be found at Learning-Scala-materials.
While the book concentrates on how to use the Scala language, there is little on Scala's associated tools (e.g. Spark). Additionally, it might have been useful to include a section on where to find further information (books, websites, blogs etc). However, these are minor concerns.
As you look further into Hadoop, you'll quickly become aware that it has a great many associated components. The Field Guide to Hadoop aims to give you a short introduction to Hadoop and its various components. The authors compare this to a field guide for birds or trees, so it is broad in scope and shallow in depth. It provides up-to-date but limited detail on the major components of the Hadoop Big Data system. Helpful links are provided for further information. Each chapter briefly covers an area of Hadoop technology, and outlines the major players. The book is not a tutorial, but a high-level overview, consisting of 132 pages in eight chapters.
If you're new to Big Data and Hadoop, and you want to quickly review what it is, and the current state of its major components, I highly recommend this small book.
Introductory Books Summary
If you are new to Big Data and Hadoop, I recommend you read Hadoop Finance Essentials to get a background understanding of Hadoop and its major components. If you already have some understanding of Hadoop, or you feel confident, the next book to read is Big Data Made Easy, this book is both practical and wide-ranging.
Most introductory Hadoop books have a section on Spark, but for a more detailed approach, I recommend Learning Spark. Spark can be programmed using Java, Python or Scala, I recommend you try Scala when using Spark – since is it more concise and tends to get the Spark functionality first. You can learn more about the language in Learning Scala.
Before you can process huge volumes of data, you first need to get the data into Hadoop. Typically, Sqoop is used to import data from relational databases into Hadoop, and Flume is used to import other data (e.g. log files).
I have read and used Apache Sqoop Cookbook extensively, but I haven't yet reviewed it, waiting for an updated version of the book. Published in 2013, it is getting a bit old, and doesn't cover the latest developments of Sqoop 2. That said, it is a very useful introductory guide, very easy to read, wide in scope, and provides plenty of example template code that you can integrate into your own solutions.
This book will enable you to create Flume agents to transfer log data into Hadoop, with due consideration. I highly recommend this book.
|Last Updated ( Monday, 14 December 2015 )|