RAPIDS GPU Data Analysis Platform Launched
Written by Kay Ewbank   
Thursday, 18 October 2018

A collection of software libraries for machine learning and data analysis has been released by NVidia. The GPU-based system, RAPIDS, consists of a suite of open-source software libraries for data science and analytics pipelines on GPU.

The aim of RAPIDS is to accelerate all the elements of creating a data science pipeline including data loading, ETL, model training, and inference. The developers say RAPIDS is up to 50 times faster on typical end-to-end data science workflows. While there are a number of existing machine learning algorithms that use GPU acceleration, what marks out RAPIDS is that it covers the entire process from data loading to deployment. Where RAPIDS gains the speed advantage is in the copy and convert stages.

RAPIDS has been co-developed by NVidia and developers from some popular open-source projects, specifically Apache Arrow, pandas and scikit-learn. NVidia is also collaborating with other open-source companies including Anaconda, BlazingDB, Databricks, Quansight and scikit-learn. The software is also being integrated into the Apache Spark open-source framework for data analytics.

RAPIDS Data Pipeline

The RAPIDS software libraries consist of Python packages.It has a columnar data structure called a GPU DataFrame, which implements the Apache Arrow columnar data format on the GPU. The DataFrame has an API that is similar to pandas making it easy to build GPU-accelerated workflows. RAPIDS supports multi-node, multi-GPU deployments, making it easier to scale up and out. This API handles operations on data columns including unary and binary operations, filters, joins, and groupbys. Under the covers you get the Python library PyGDF, and the C++/CUDA GPU DataFrames implementation in libgdf.

Another package is a collection of machine learning algorithms that operate on GPU DataFrames, and you also get XGBoost, a machine learning packages for training gradient boosted decision trees. You can pass data directly to XGBoost while remaining in GPU memory. Other libraries include a GPU-accelerated library of machine learning algorithms including Singular Value Decomposition (SVD), Principal Component Analysis (PCA), Density-based Spatial Clustering of Applications with Noise (DBSCAN); and a library of low-level math and computational primitives. 

RAPIDS is being released under the Apache license. Containerized versions of RAPIDS are available on the NVIDIA GPU Cloud container registry.

 rapids

More Information

RAPIDS Libraries

Related Articles

Databricks Adds ML Model Export

Machine Learning Added To Azure HDInsight

Apache Kylin 2.5 Adds All-in-Spark Cubing Engine

Spark Gets NLP Library

 

 

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on, Twitter, FacebookGoogle+ or Linkedin.

Banner


Google To Charge For Its Android Apps In The EU - Update: Annual Revenue of $5 Billion
22/10/2018

The European Commission's strong approach to controlling all things computing might be a good thing, but it is certainly creating a two-tier environment. Google is now going to charge for its Android  [ ... ]



Amazon Patents Nurse Alexa
21/10/2018

Amazon was granted a patent last week for "Voice-based determination of physical and emotional characteristics of users". It raises two distinct issues. One is the recurring complaint about patents in [ ... ]


More News

Python

 



 

Comments




or email your comment to: comments@i-programmer.info

Last Updated ( Thursday, 18 October 2018 )