Databricks Runtime for Machine Learning
Written by Kay Ewbank   
Monday, 15 April 2019

Databricks Runtime for Machine Learning is now generally available, offering native integration with popular ML/DL frameworks, such as scikit-learn, XGBoost, TensorFlow, PyTorch, Keras, and Horovod.

Databricks Runtime for Machine Learning is a machine learning runtime that contains multiple popular libraries, including TensorFlow, PyTorch, Keras, and XGBoost. It also supports distributed training using Horovod. The benefit of Databricks Runtime ML is that it provides a ready-to-go environment for machine learning.

databrickslogo

Databricks Runtime ML aims to be easy to use. The main libraries supported in it come pre-configured as part of the software, including HorovodRunner, which makes it possible to use the distributed deep learning framework Horovod. Horovod can be trickier to use than some frameworks because it requires you to share code and libraries across nodes, configure SSH, and execute MPI commands. HorovodRunner avoids the needs for these requirements by providing an API to allow you to use Horovod.

The team at Databricks has set up the most popular machine learning libraries as “top-tier” libraries. For these “top-tier” libraries, Databricks plans to make faster updates and provides advanced support. The top-tier libraries are:

  • TensorFlow / TensorBoard / tf.keras
  • spark-tensorflow-connector
  • PyTorch
  • Horovod / HorovodRunner
  • GraphFrames

Performance is another area to have been improved since the beta was released. In this release, improvements have been made to both Apache Spark MLlib logistic regression and tree classifiers. When running in Databricks Runtime for ML, the team at Databricks has observed around a 40% speed-up in Spark Performance Tests compared to Apache Spark 2.4.0.

The GraphFrames library in Databricks Runtime for ML also contains an optimized implementation that runs twice as fast as open-source GraphFrames and supports bigger graphs. In addition, Graph queries will make use of Spark cost-based optimization (CBO) to determine the join orders if the underlying node and edge tables contain column statistics. This can lead to as much as 100 times speed up, according to Databricks.

 

databrickslogo 

 

 

More Information

Databricks Website

Related Articles

Databricks Adds ML Model Export

 Databricks Delta Adds Faster Parquet Import

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

Banner


Faker Rebooted As Community-Controlled Project
19/01/2022

After faker.js was deliberately deleted by its owner Marak Squires the project is again available on npm under new management. The new GitHub repo is faker-js/faker and the project has seen a massive  [ ... ]



Amazon Retiring Alexa Internet
14/01/2022

Amazon has called time for Alexa.com, the resource that has ranked websites in terms of their popularity based on global web traffic for over 25 years. Will we miss it or was it really past its sell-b [ ... ]


More News

square

 



 

Comments




or email your comment to: comments@i-programmer.info

Last Updated ( Monday, 15 April 2019 )