ZetaSQL Parser & Analyzer Code Released
Written by Kay Ewbank   
Tuesday, 30 April 2019

Google has started the process of open sourcing ZetaSQL, a SQL front-end that consists of a parser and analyzer. It is designed to work with a variety of back ends, and could be a rival for Apache Calcite in the JVM ecosystem.

ZetaSQL is a C++ SQL parser that is used internally at Google for the BigQuery standard sql, among other things. The developers have open sourced the Java frontend and are now working on an adapter between ZetaSQL and Calcite for Apache Beam. Calcite is Apache's open source framework consisting of a SQL parser, an API, and a query planning engine.  The advantage for Google in using ZetaSQL rather than Calcite is that it will allow the use of same SQL dialect in both BigQuery and Beam.


The fact that ZetaSQL is used as parser and analyzer for Google's BigQuery's Standard SQL dialect is what makes this release interesting. ZetaSQL is also the ANSI Standard SQL parser for Spanner, and will soon be used for DataflowSQL.  Google BigQuery is Google’s tool that lets you run SQL-like queries against very large datasets. It is designed to work most effectively when used for interactive analysis of very large datasets, typically using a small number of very large, append-only tables. Spanner is Google's  globally-distributed and synchronously-replicated database, It is used internally by Google for everything from Gmail, Google Photos, Calendar, Android Market, and Ad Words. DataFlow is Google's data processing service for both batch and real-time data streaming applications.

ZetaSQL has been created to work with a variety of back ends, and uses GRPC to communicate with servers. Its intended use is to provide consistent behavior for tasks such as semantic analysis, name resolution, type checking, and implicit casting. It has features such as approximate algorithms (HLL and friends) and JSON support. It also comes with native support for constructing protobufs and writing UDFs in JavaScript.

The developers of ZetaSQL point out that in its open source form, specific query engines may not implement all features in the ZetaSQL language and may give errors if specific features are not supported.

The intention is that the codebase for ZetaSQL, which defines a language (grammar, types, data model, and semantics) as well as a parser and analyzer, will be open sourced in multiple phases, starting with the current release of the parser and analyzer. The developers say that until more phases have been released, no guarantees are being made of API stability and no contributions are being accepted.



More Information

ZetSQL On GitHub

Related Articles

Google BigQuery Updated

Google's Cloud Spanner To Settle the Relational vs NoSQL Debate?

Google Cloud Dataflow SDK

Google Moves On From MapReduce, Launches Cloud Dataflow

BigQuery Now Open to All

Google BigQuery Service

Google BigQuery gets scripting and spreadsheets


To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.


GitHub Enterprise Server Adds Projects Support

GitHub Enterprise 3.8 has been released with improvements including support for GitHub Projects along with new security and admin features, and expanded Actions support. 

Git 2.40 Improves Jump

The latest version of Git, the distributed version control system, has been released with improvements including Emacs support in Git Jump.

More News





or email your comment to: comments@i-programmer.info

Last Updated ( Tuesday, 30 April 2019 )