Databricks Delta Adds Faster Parquet Import
Written by Kay Ewbank   
Tuesday, 05 March 2019

There's an updated version of Databricks Delta that improves the speed that Parquet data can be imported and has stronger merge features. The analytics engine has also been made available on Amazon AWS and Azure for Databricks users.

Databricks Delta is a unified analytics engine and associated table format built on top of Apache Spark. Databricks was created as a company by the original developers of Apache Spark and specializes in commercial technologies that make use of Spark.

databricks

When it was originally launched at the Apache Spark Summit in 2017, the Databricks CEO and co-founder Ali Ghodsi described Delta as "an AI capable data warehouse at the scale of a data lake.” The idea is that Delta takes the best bits of data warehouses and data lakes, and adds in streaming data to enable predictive analytics.

Databricks Delta provides ACID transactions, optimized layouts and indexes for building data pipelines that can be used to work with big data. Databricks says Delta is 10 -100 times faster than Apache Spark on Parquet.  It has been designed for both batch and stream processing, and can be used for pipeline development, data management, and query serving. It aims to offer  high reliability and low latency by using techniques such as schema validation, compaction, and data skipping.

The developers say the new fast Parquet import is also more economical in its use of extra compute and storage resources. Another improvement in the updated version is automatic versioning of the big data stored in customers' data lakes, meaning it is possible to access any historical version of that data.

Merging is another area to have been improved, with new support for multiple MATCHED clauses, additional conditions in MATCHED and NOT MATCHED clauses, and a DELETE action. There is also support for * in UPDATE and INSERT actions to automatically fill in column names, making it easier to write MERGE queries for tables with a very large number of columns.

Alongside the improvements, Databricks is now offering Databricks Delta on Azure and AWS. Azure Databricks users can now use Delta for Data Engineering and Data Analytics from both the Azure Databricks Standard and the Azure Databricks Premium SKUs. Databricks on AWS customers can also use Delta from both Data Engineering and Data Analytics.

databricks

More Information

Databricks Website

Related Articles

Spark Gets NLP Library

Apache Spark With Structured Streaming

Spark BI Gets Fine Grain Security

Spark 2.0 Released

Apache Spark Technical Preview

Spark Announcements

Apache Releases Spark 1.6

Spark 1.4 Released

SPARQL Moves Closer

 

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

Banner


50 Years Of Rubik's Cube
07/07/2024

The iconic 3D mechanical puzzle, Rubik's Cube, was invented in 1974 and celebrates its 50th anniversary this year. As well as being a popular puzzle that anyone can try to solve, it touches on some in [ ... ]



NSA Refuses To Release Grace Hopper Tapes
14/07/2024

A lecture by Grace Hopper with the title “Future Possibilities: Data, Hardware, Software, and People” was recorded on videotape. More than 40 years later NSA is refusing to release it.


More News

kotlin book

 

Comments




or email your comment to: comments@i-programmer.info

Last Updated ( Tuesday, 05 March 2019 )