|Google Builds Data Lake On BigQuery|
|Written by Kay Ewbank|
|Thursday, 07 April 2022|
Google has launched BigLake, a data platform built on Google BigQuery and Dataplex that can be used for data analysis for both structured and unstructured data
The BigLake team says it gives users fine-grained access controls, along with performance acceleration across BigQuery and multicloud data lakes on AWS and Azure. BigLake also makes that data uniformly accessible across Google Cloud and open source engines with consistent security.
Google BigQuery is a distributed, serverless SQL engine that provides a way to query petabytes of data. It has built in machine learning, is serverless, and supported by Google Cloud. Dataplex is described as "an intelligent data fabric that unifies your distributed data to help automate data management and power analytics at scale." It was introduced by Google in 2021, and in BigLake is used to automatically scan Google Cloud storage to register BigLake table definitions in BigQuery, and makes them available via Dataproc Metastore.
BigLake works by using access delegation to deal with the problem of accessing the data without needing direct access to the underlying cloud storage data. Access delegation can be used to securely grant row and column level access to users and pipelines without having to provide full access to the table by using BigQuery to enforce the row and column level access controls.
BigLake means BigQuery can now be used with multicloud data lakes and open formats such as Parquet and ORC with fine-grained security controls. The integration with Dataplex means you can keep a single copy of data and enforce consistent access controls across analytics engines including Google Cloud and open-source technologies such as Spark, Presto, Trino, and Tensorflow.
Google BigLake is available now.
or email your comment to: email@example.com