Microsoft Releases DeepSpeed For PyTorch
Written by Kay Ewbank   
Thursday, 13 February 2020

Microsoft Research has released an open source library that's compatible with PyTorch. DeepSpeed is a deep learning optimization library that makes it easier to work with large models for training, making it possible to train 100-billion-parameter models.

Microsoft says that the new library uses memory optimization technology to improve PyTorch model training, meaning researchers can use more parameters. The library makes better use of memory that is local to the GPU, and can be used with existing PyTorch applications with only minor changes to the app.


Advantages offered by DeepSpeed include distributed training, mixed precision, and checkpointing, through lightweight APIs that are compatible with PyTorch.

One part of the DeepSpeed library, ZeRO, is the parallelized optimizer that is responsible for the reduction in resource use. Microsoft says researchers have used these breakthroughs to create Turing Natural Language Generation (Turing-NLG), the largest publicly known language model at 17 billion parameters. The Zero Redundancy Optimizer (ZeRO) can train deep learning models with 100 billion parameters on the current generation of GPU clusters at three to five times the throughput of the current best system.

Other optimization techniques included in DeepSpeed include constant buffer optimization and smart gradient accumulation. Constant Buffer Optimization (CBO) enables high network and memory throughput while restricting memory usage to a constant size. The way this works is that for most memory- and network-bound operations, the performance depends on the size of the operand. CBO in DeepSpeed fuses smaller operands into a pre-defined sized buffer big enough to improve performance without unnecessary memory overhead.

The third optimization technique is Smart Gradient Accumulation. This can be used to run larger batch size with limited memory by breaking an effective batch into several sequential micro-batches, and averaging the parameter gradients across these micro-batches.

The researchers asy DeepSpeed supports all forms of model parallelism including tensor slicing based approaches such as the Megatron-LM, or a pipelined parallelism approach such as PipeDream or GPipe. It does so by only requiring the model parallelism framework to provide a model parallelism unit (mpu) that implements a few bookkeeping functionalities.

DeepSpeed is available for download on GitHub.



More Information

DeepSpeed On GitHub

Related Articles

PyTorch Adds TorchScript API

PyTorch Scholarship Challenge

Microsoft Cognitive Toolkit Version 2.0

Microsoft Open Sources Natural Language Processing Tool

Microsoft Open Sources AI Debugging Tool

More AI Tools From Microsoft

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.


Grafana Adds New Tools

Grafana Labs has announced new tools to make it easier to analyze application data on Grafana Cloud. The announcements are an Application Observability tool for Grafana Cloud, and Grafana Beyla, the e [ ... ]

AWS Lambda Adopts Java 21

AWS Lambda functions can now use all the new and useful language features as well as performance improvements introduced in Java 21 as part of the Amazon Corretto JDK implementation.

More News

Last Updated ( Thursday, 13 February 2020 )