Google BigQuery: The Definitive Guide

Author: Valliappa Lakshmanan and Jordan Tigani
Publisher: O'Reilly
Pages: 498
ISBN: 978-1492044468
Print: 1492044466
Kindle: B07ZHQ3MGN
Audience: Developers wanting to use BigQuery
Rating: 5
Reviewer: Kay Ewbank

Google BigQuery is a distributed, serverless SQL engine that provides a way to query petabytes of data. It has built in machine learning, and is serverless. This book by Google insiders aims to show what it can do and how you can do it. 

The interesting thing about BigQuery is that the basics are very familiar for database developers - you can get a long way with basic SQL. To get the best from BigQuery, though, you need to think about less conventional topics such as concurrency, and perhaps to move into its machine learning capabilities.

The authors of the book both work for Google, one as head of data analytics and AI solutions, the other as the director of product management for BigQuery. More importantly, he was one of the founding engineers working on BigQuery. This means both authors are talking from a position of real familiarity with BigQuery and what it can do.

Banner

The book starts with a chapter explaining what BigQuery is and what it can do, including some background on how Google came to develop it and what makes it possible. The authors then move on to describing query essentials, starting with simple queries based on Select, and filtering using Where, Except and Replace. This chapter works through a range of keywords and principals that would look familiar in any book on SQL - aggregates, joins, and views, for example. The next chapter has a lot of familiarity too - datatypes, numeric functions, string, time and date and Boolean functions.

From here on, though, the story gets less familiar, as the authors show how to load data into BigQuery, and look at federated queries drawing data from multiple data sources, and the use of Cloud Dataflow to read and write data from BigQuery.

 

The next chapter moves on to developing with BigQuery using the REST API and the Cloud Client Library. This chapter also introduces accessing BigQuery from tools including pandas, Jupyter, and R, as well as the JDBC drivers. The chapter ends with a look at Bash scripting with BigQuery.

Next comes an interesting chapter on the architecture of BigQuery, the life of a query request, the Dremel query engine, and how BigQuery uses storage. The authors then move on to optimizing performance and cost when using BigQuery. This is a long chapter - 60 pages - full of detailed information and code for measuring and troubleshooting query performance, how to increase query speed, and how to optimize where data is stored and accessed.

Another chapter on queries - advanced queries this time - follows, looking at creating reusable queries via parameters and SQL user-defined functions. Using the more advanced aspects of SQL with BigQuery - arrays, window functions, and DDL and DML are also covered. Moving beyond SQL to JavaScript UDFs, and advanced functions via Geographic Information Systems, statistical functions and hash algorithms are all touched on. For me, this chapter could have been longer and each topic covered in more depth.

A meaty chapter on machine learning is next on the agenda, coming in at 60 pages and including coverage of building a regression model, building a classification model, and means clustering, as well as recommender systems, and using custom machine learning models. The book ends with a chapter on administering and securing BigQuery.

This is a good book, bristling with practical examples and code, and detailed step by step instructions where appropriate. For example, in the chapter on loading data into BigQuery, you're shown how to load from a local source, with discussions of why it's a good idea to compress the file, how to page through the gzipped file from Cloud Shell, whether to choose loading or streaming, as well as a SQL query to actually query the dataset. In other words, you get a mix of the why as well as the how, with code to follow and modify.  If you work your way through the examples in the book, you'll have a good grasp of just what Google BigQuery can do, and why you might want to use it.

 

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

Banner


Seriously Good Software

Author: Marco Faella
Publisher: Manning
Date: March 2020
Pages: 328
ISBN: 978-1617296291
Print: 1617296295
Kindle: B09782DKN8
Audience: Relatively experienced Java programmers
Rating: 4.5
Reviewer: Mike James
Don't we all want to write seriously good software?



Learn Quantum Computing with Python and Q#

Author: Dr. Sarah Kaiser and Dr. Chris Granade
Publisher: Manning
Date: June 2021
Pages: 384
ISBN: 978-1617296130
Print: 1617296139
Kindle: B098BNK1T9
Audience: Developers interested in quantum computing
Rating: 4.5
Reviewer: Mike James
Quantum - it's the future...


More Reviews

 

 

Last Updated ( Saturday, 28 November 2020 )