Google BigQuery: The Definitive Guide

Author: Valliappa Lakshmanan and Jordan Tigani
Publisher: O'Reilly
Pages: 498
ISBN: 978-1492044468
Print: 1492044466
Kindle: B07ZHQ3MGN
Audience: Developers wanting to use BigQuery
Rating: 5
Reviewer: Kay Ewbank

Google BigQuery is a distributed, serverless SQL engine that provides a way to query petabytes of data. It has built in machine learning, and is serverless. This book by Google insiders aims to show what it can do and how you can do it. 

The interesting thing about BigQuery is that the basics are very familiar for database developers - you can get a long way with basic SQL. To get the best from BigQuery, though, you need to think about less conventional topics such as concurrency, and perhaps to move into its machine learning capabilities.

The authors of the book both work for Google, one as head of data analytics and AI solutions, the other as the director of product management for BigQuery. More importantly, he was one of the founding engineers working on BigQuery. This means both authors are talking from a position of real familiarity with BigQuery and what it can do.

Banner

The book starts with a chapter explaining what BigQuery is and what it can do, including some background on how Google came to develop it and what makes it possible. The authors then move on to describing query essentials, starting with simple queries based on Select, and filtering using Where, Except and Replace. This chapter works through a range of keywords and principals that would look familiar in any book on SQL - aggregates, joins, and views, for example. The next chapter has a lot of familiarity too - datatypes, numeric functions, string, time and date and Boolean functions.

From here on, though, the story gets less familiar, as the authors show how to load data into BigQuery, and look at federated queries drawing data from multiple data sources, and the use of Cloud Dataflow to read and write data from BigQuery.

 

The next chapter moves on to developing with BigQuery using the REST API and the Cloud Client Library. This chapter also introduces accessing BigQuery from tools including pandas, Jupyter, and R, as well as the JDBC drivers. The chapter ends with a look at Bash scripting with BigQuery.

Next comes an interesting chapter on the architecture of BigQuery, the life of a query request, the Dremel query engine, and how BigQuery uses storage. The authors then move on to optimizing performance and cost when using BigQuery. This is a long chapter - 60 pages - full of detailed information and code for measuring and troubleshooting query performance, how to increase query speed, and how to optimize where data is stored and accessed.

Another chapter on queries - advanced queries this time - follows, looking at creating reusable queries via parameters and SQL user-defined functions. Using the more advanced aspects of SQL with BigQuery - arrays, window functions, and DDL and DML are also covered. Moving beyond SQL to JavaScript UDFs, and advanced functions via Geographic Information Systems, statistical functions and hash algorithms are all touched on. For me, this chapter could have been longer and each topic covered in more depth.

A meaty chapter on machine learning is next on the agenda, coming in at 60 pages and including coverage of building a regression model, building a classification model, and means clustering, as well as recommender systems, and using custom machine learning models. The book ends with a chapter on administering and securing BigQuery.

This is a good book, bristling with practical examples and code, and detailed step by step instructions where appropriate. For example, in the chapter on loading data into BigQuery, you're shown how to load from a local source, with discussions of why it's a good idea to compress the file, how to page through the gzipped file from Cloud Shell, whether to choose loading or streaming, as well as a SQL query to actually query the dataset. In other words, you get a mix of the why as well as the how, with code to follow and modify.  If you work your way through the examples in the book, you'll have a good grasp of just what Google BigQuery can do, and why you might want to use it.

 

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

Banner


Administering Relational Databases on Microsoft Azure

Author: Prashanth Jayaram et al
Publisher: Independent
Pages: 622
ISBN: 979-8706128029
Print: B08Y4LBTP4
Kindle: B08XZQJHMK
Audience: Azure DBAs
Rating: 2 or 4 (see review for details)
Reviewer: Ian Stirk

This book aims to help you pass the Azure Relational Database exam DP-300, how does it fare?



Bash (In Easy Steps)

Author: Mike McGrath
Publisher: In Easy Steps
Pages: 192
ISBN: 978-1840788099
Print: 1840788097
Kindle: B07NNWVL2Z
Audience:Bash developers
Rating: 4
Reviewer: Harry Fairhead
Is Bash a computer language? It is certainly complex enough to need to book to explain it.


More Reviews

 

 

Last Updated ( Saturday, 28 November 2020 )