ZetaSQL Parser & Analyzer Code Released
Written by Kay Ewbank   
Tuesday, 30 April 2019

Google has started the process of open sourcing ZetaSQL, a SQL front-end that consists of a parser and analyzer. It is designed to work with a variety of back ends, and could be a rival for Apache Calcite in the JVM ecosystem.

ZetaSQL is a C++ SQL parser that is used internally at Google for the BigQuery standard sql, among other things. The developers have open sourced the Java frontend and are now working on an adapter between ZetaSQL and Calcite for Apache Beam. Calcite is Apache's open source framework consisting of a SQL parser, an API, and a query planning engine.  The advantage for Google in using ZetaSQL rather than Calcite is that it will allow the use of same SQL dialect in both BigQuery and Beam.

googleicon

The fact that ZetaSQL is used as parser and analyzer for Google's BigQuery's Standard SQL dialect is what makes this release interesting. ZetaSQL is also the ANSI Standard SQL parser for Spanner, and will soon be used for DataflowSQL.  Google BigQuery is Google’s tool that lets you run SQL-like queries against very large datasets. It is designed to work most effectively when used for interactive analysis of very large datasets, typically using a small number of very large, append-only tables. Spanner is Google's  globally-distributed and synchronously-replicated database, It is used internally by Google for everything from Gmail, Google Photos, Calendar, Android Market, and Ad Words. DataFlow is Google's data processing service for both batch and real-time data streaming applications.

ZetaSQL has been created to work with a variety of back ends, and uses GRPC to communicate with servers. Its intended use is to provide consistent behavior for tasks such as semantic analysis, name resolution, type checking, and implicit casting. It has features such as approximate algorithms (HLL and friends) and JSON support. It also comes with native support for constructing protobufs and writing UDFs in JavaScript.

The developers of ZetaSQL point out that in its open source form, specific query engines may not implement all features in the ZetaSQL language and may give errors if specific features are not supported.

The intention is that the codebase for ZetaSQL, which defines a language (grammar, types, data model, and semantics) as well as a parser and analyzer, will be open sourced in multiple phases, starting with the current release of the parser and analyzer. The developers say that until more phases have been released, no guarantees are being made of API stability and no contributions are being accepted.

googleicon 

 

More Information

ZetSQL On GitHub

Related Articles

Google BigQuery Updated

Google's Cloud Spanner To Settle the Relational vs NoSQL Debate?

Google Cloud Dataflow SDK

Google Moves On From MapReduce, Launches Cloud Dataflow

BigQuery Now Open to All

Google BigQuery Service

Google BigQuery gets scripting and spreadsheets

 

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on, Twitter, Facebook or Linkedin.

Banner


MasterTracks and Professional Certificates in Data Science on Coursera
12/09/2019

Coursera has introduced two new types of credentials and has  Data Science offering for both of them. The new MasterTrack courses will be of interest to those looking to gain a Masters Degree whi [ ... ]



All You Wanted To Know About AI From DeepMind
03/09/2019

The DeepMind podcast is hosted Dr Hannah Fry and attempts to give answers to the most frequently encountered questions about Artificial Intelligence.


More News

graphics

 



 

Comments




or email your comment to: comments@i-programmer.info

Last Updated ( Tuesday, 30 April 2019 )