DataBricks Open Sources All Of Delta Lake
Written by Kay Ewbank   
Thursday, 07 July 2022

Databricks has now made all of Delta Lake open source, including all the APIs. The storage layer of the product was made open source in 2019. Delta Lake can be used to build data lakehouses, which enable data warehousing and machine learning directly on the data lake.

Delta Lake handles the stage where data is brought into an organization's data lake. It stores data in Apache Parquet format, and is designed for use in data lakes that are built on HDFS and cloud storage.

Databricks was created as a company by the original developers of Apache Spark and specializes in commercial technologies that make use of Spark. Delta Lake is a unified analytics engine and associated table format built on top of Apache Spark, and until it was made open source was only available as part of Databricks Delta, the company's proprietary stack.

databricks

Since the storage layer wasy made open source, the project has attracted over 190 contributors across more than 70 organizations, nearly two-thirds of whom are from outside Databricks, including contributors from companies including Apple, IBM, Microsoft, Disney, Amazon, and eBay.

Delta Lake comes with standalone readers/writers that lets any Python, Ruby, or Rust client write data directly to Delta Lake without requiring any big data engine such as Apache Spark, along with open-source connectors, including Apache Flink, Presto, and Trino. The open source announcement opens up capabilities that until now were only available in Databricks.

Delta Lake 2.0, the latest release of Delta Lake, has improvements including support for ZOrder, Change Data Feed, Dynamic Partition Overwrites, and Dropped Columns. Z-Ordering is a technique to colocate related information in the same set of files. This co-locality is used by Delta Lake in data-skipping algorithms, and the developers say it dramatically reduces the amount of data that Delta Lake on Apache Spark needs to read.

Delta Lake 2 is available now.

 
 
databricks 
 
 

More Information

Databricks Website

Delta Website

Related Articles

Databricks Delta Lake Now Open Source

Databricks Delta Adds Faster Parquet Import

Databricks Runtime for Machine Learning

Databricks Adds ML Model Export

Spark Gets NLP Library

Apache Spark With Structured Streaming

Spark BI Gets Fine Grain Security

Spark 2.0 Released

 

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

Banner


Nvidia Releases Updated AI Framework
02/08/2022

NVIDIA has announced the general availability of NVIDIA AI Enterprise 2.1. The improvements to the AI and data analytics software include new support for containers, and for public clouds.



AI Builds Lego From The Manual
07/08/2022

AI seems to be taking over all the pleasures. You struggle for hours to build that Lego model, but now AI can do the job for you in no time at all and so robs you of all your fun...


More News

pythondata

 



 

Comments




or email your comment to: comments@i-programmer.info