Google BigQuery: The Definitive Guide

Author: Valliappa Lakshmanan and Jordan Tigani
Publisher: O'Reilly
Pages: 498
ISBN: 978-1492044468
Print: 1492044466
Kindle: B07ZHQ3MGN
Audience: Developers wanting to use BigQuery
Rating: 5
Reviewer: Kay Ewbank

Google BigQuery is a distributed, serverless SQL engine that provides a way to query petabytes of data. It has built in machine learning, and is serverless. This book by Google insiders aims to show what it can do and how you can do it. 

The interesting thing about BigQuery is that the basics are very familiar for database developers - you can get a long way with basic SQL. To get the best from BigQuery, though, you need to think about less conventional topics such as concurrency, and perhaps to move into its machine learning capabilities.

The authors of the book both work for Google, one as head of data analytics and AI solutions, the other as the director of product management for BigQuery. More importantly, he was one of the founding engineers working on BigQuery. This means both authors are talking from a position of real familiarity with BigQuery and what it can do.

Banner

The book starts with a chapter explaining what BigQuery is and what it can do, including some background on how Google came to develop it and what makes it possible. The authors then move on to describing query essentials, starting with simple queries based on Select, and filtering using Where, Except and Replace. This chapter works through a range of keywords and principals that would look familiar in any book on SQL - aggregates, joins, and views, for example. The next chapter has a lot of familiarity too - datatypes, numeric functions, string, time and date and Boolean functions.

From here on, though, the story gets less familiar, as the authors show how to load data into BigQuery, and look at federated queries drawing data from multiple data sources, and the use of Cloud Dataflow to read and write data from BigQuery.

 

The next chapter moves on to developing with BigQuery using the REST API and the Cloud Client Library. This chapter also introduces accessing BigQuery from tools including pandas, Jupyter, and R, as well as the JDBC drivers. The chapter ends with a look at Bash scripting with BigQuery.

Next comes an interesting chapter on the architecture of BigQuery, the life of a query request, the Dremel query engine, and how BigQuery uses storage. The authors then move on to optimizing performance and cost when using BigQuery. This is a long chapter - 60 pages - full of detailed information and code for measuring and troubleshooting query performance, how to increase query speed, and how to optimize where data is stored and accessed.

Another chapter on queries - advanced queries this time - follows, looking at creating reusable queries via parameters and SQL user-defined functions. Using the more advanced aspects of SQL with BigQuery - arrays, window functions, and DDL and DML are also covered. Moving beyond SQL to JavaScript UDFs, and advanced functions via Geographic Information Systems, statistical functions and hash algorithms are all touched on. For me, this chapter could have been longer and each topic covered in more depth.

A meaty chapter on machine learning is next on the agenda, coming in at 60 pages and including coverage of building a regression model, building a classification model, and means clustering, as well as recommender systems, and using custom machine learning models. The book ends with a chapter on administering and securing BigQuery.

This is a good book, bristling with practical examples and code, and detailed step by step instructions where appropriate. For example, in the chapter on loading data into BigQuery, you're shown how to load from a local source, with discussions of why it's a good idea to compress the file, how to page through the gzipped file from Cloud Shell, whether to choose loading or streaming, as well as a SQL query to actually query the dataset. In other words, you get a mix of the why as well as the how, with code to follow and modify.  If you work your way through the examples in the book, you'll have a good grasp of just what Google BigQuery can do, and why you might want to use it.

 

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on, Twitter, Facebook or Linkedin.

Banner


Learn Windows PowerShell in a Month of Lunches (3rd Edition)

Author: Donald W. Jones & Jeffrey D. Hicks
Publisher: Manning Publications
Pages: 384
ISBN: 978-1617294167
Print: 1617294160
Audience: Admins and devs
Rating: 5
Reviewer: Ian Stirk

This book aims to teach you PowerShell in around 25 hours, how does it fare?



Basic Electronics: Theory and Practice

Authors: Sean Westcott, Jean Riescher Westcott
Publisher: Mercury
Pages: 
ISBN: 978-683925286
Print: 1683925289
Kindle:  B08BPK1VW2
Audience: Students and hobbyists
Rating: 2
Reviewer: Harry Fairhead
More and more programmers need an understanding of electronics as the needs of the IoT mix  [ ... ]


More Reviews

 

 

Last Updated ( Saturday, 16 May 2020 )