DataBricks Open Sources All Of Delta Lake
Written by Kay Ewbank   
Thursday, 07 July 2022

Databricks has now made all of Delta Lake open source, including all the APIs. The storage layer of the product was made open source in 2019. Delta Lake can be used to build data lakehouses, which enable data warehousing and machine learning directly on the data lake.

Delta Lake handles the stage where data is brought into an organization's data lake. It stores data in Apache Parquet format, and is designed for use in data lakes that are built on HDFS and cloud storage.

Databricks was created as a company by the original developers of Apache Spark and specializes in commercial technologies that make use of Spark. Delta Lake is a unified analytics engine and associated table format built on top of Apache Spark, and until it was made open source was only available as part of Databricks Delta, the company's proprietary stack.


Since the storage layer wasy made open source, the project has attracted over 190 contributors across more than 70 organizations, nearly two-thirds of whom are from outside Databricks, including contributors from companies including Apple, IBM, Microsoft, Disney, Amazon, and eBay.

Delta Lake comes with standalone readers/writers that lets any Python, Ruby, or Rust client write data directly to Delta Lake without requiring any big data engine such as Apache Spark, along with open-source connectors, including Apache Flink, Presto, and Trino. The open source announcement opens up capabilities that until now were only available in Databricks.

Delta Lake 2.0, the latest release of Delta Lake, has improvements including support for ZOrder, Change Data Feed, Dynamic Partition Overwrites, and Dropped Columns. Z-Ordering is a technique to colocate related information in the same set of files. This co-locality is used by Delta Lake in data-skipping algorithms, and the developers say it dramatically reduces the amount of data that Delta Lake on Apache Spark needs to read.

Delta Lake 2 is available now.


More Information

Databricks Website

Delta Website

Related Articles

Databricks Delta Lake Now Open Source

Databricks Delta Adds Faster Parquet Import

Databricks Runtime for Machine Learning

Databricks Adds ML Model Export

Spark Gets NLP Library

Apache Spark With Structured Streaming

Spark BI Gets Fine Grain Security

Spark 2.0 Released


To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.


Explore SyncFusion's Blazor Playground

Syncfusion has provided an in-browser environment where you can write, compile and run code that uses Blazor components and get it previewed live.

Avi Wigderson Gains Turing Award

Israeli mathematician and computer scientist, Avi Wigderson, is the recipient of the 2023 ACM A.M Turing Award which carries a $1 million prize with financial support from Google.

More News

raspberry pi books



or email your comment to: