Apache SystemDS 3.0 Released
Written by Kay Ewbank   
Thursday, 14 July 2022

Apache SystemDS 3.0 has been released with improvements including a unified memory manager, a federated backend, and full support for the Top-K cleaning framework.

SystemDS used to be called SystemML, and has been renamed to reflect the change of focus to the end-to-end data science lifecycle.

systemds

SystemDS is a flexible, scalable machine learning system optimized for big data that provides declarative, large-scale machine learning and deep learning. SystemML can be run on top of Apache Spark, where it automatically scales data, line by line, to determine whether code should be run on the driver or an Apache Spark cluster.

When it became a top level project at Apache, Luciano Resende, Architect at the IBM Spark Technology Center and Apache SystemML Incubator Mentor, described SystemML as:

"like SQL for Machine Learning, it enables Data Scientists to concentrate on the problem at hand, working in a high-level script language like R, and all the optimizations and rewrites are handled by the very powerful SystemML optimizer that considers data and available resources to produce the best execution plan for the application"

SystemDS is open source, and has tools for managing the end-to-end data science lifecycle from data integration, cleaning, local and distributed ML model training, to deployment and serving. SystemDS provides declarative languages with R-like syntax that are used for high-level scripts. These are then compiled into hybrid execution plans of local, in-memory CPU and GPU operations, as well as distributed operations on Apache Spark.

It provides the ability to customize algorithms via R-like and Python-like languages, and has multiple execution modes, including Spark MLContext, Spark Batch, Standalone, and JMLC. SystemDS offers automatic optimization based on data and cluster characteristics.

One of the improvements to this release is the use of Java 11 and Spark 3.The new release also adds a federated backend, with support for multi-tenancy, tenants isolation and reuse of intermediates across tenants for federated workers. There's also extended support for the cost-based federated planner.

Support has also been added for the Top-K cleaning framework, a framework for automatically generating the top-K most effective data cleaning pipelines.

Version 3 also adds experimental support for CUDA code generation and operator fusion, and can be used with NVIDIA GPU cards with CUDA architectures. 

There's also experimental improvements for robustness and performance for Compressed Linear Algebra.

SystemDS 3 is available now.

 systemds

More Information

SystemDS Website

Related Articles

Microsoft Releases Open Source Distributed Machine Learning Library

Spark 3 Improves Python and SQL Support

Spark Gets NLP Library

Google Announces Framework For Data Science Predictions

 

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

Banner


GitHub Has 100 Million Active Users
27/01/2023

In his keynote at GitHub Universe 2019, then CEO Nat Friedman announced GitHub had more than 40 million active users and predicted that by 2025 it would have 100 million. That target has already been  [ ... ]



Google Launches Free Vulnerability Scanner
29/12/2022

Google has announced the availability of OSV-Scanner, a free tool that acts as a front end interface to the Open Source Vulnerability (OSV) database. The OSV-Scanner assesses a project's dependen [ ... ]


More News

picobook

 



 

Comments




or email your comment to: comments@i-programmer.info

Last Updated ( Monday, 25 July 2022 )