Microsoft Releases DeepSpeed For PyTorch

Written by Kay Ewbank

Thursday, 13 February 2020

Microsoft Research has released an open source library that's compatible with PyTorch. DeepSpeed is a deep learning optimization library that makes it easier to work with large models for training, making it possible to train 100-billion-parameter models.

Microsoft says that the new library uses memory optimization technology to improve PyTorch model training, meaning researchers can use more parameters. The library makes better use of memory that is local to the GPU, and can be used with existing PyTorch applications with only minor changes to the app.

msai

Advantages offered by DeepSpeed include distributed training, mixed precision, and checkpointing, through lightweight APIs that are compatible with PyTorch.

One part of the DeepSpeed library, ZeRO, is the parallelized optimizer that is responsible for the reduction in resource use. Microsoft says researchers have used these breakthroughs to create Turing Natural Language Generation (Turing-NLG), the largest publicly known language model at 17 billion parameters. The Zero Redundancy Optimizer (ZeRO) can train deep learning models with 100 billion parameters on the current generation of GPU clusters at three to five times the throughput of the current best system.

Other optimization techniques included in DeepSpeed include constant buffer optimization and smart gradient accumulation. Constant Buffer Optimization (CBO) enables high network and memory throughput while restricting memory usage to a constant size. The way this works is that for most memory- and network-bound operations, the performance depends on the size of the operand. CBO in DeepSpeed fuses smaller operands into a pre-defined sized buffer big enough to improve performance without unnecessary memory overhead.

The third optimization technique is Smart Gradient Accumulation. This can be used to run larger batch size with limited memory by breaking an effective batch into several sequential micro-batches, and averaging the parameter gradients across these micro-batches.

The researchers asy DeepSpeed supports all forms of model parallelism including tensor slicing based approaches such as the Megatron-LM, or a pipelined parallelism approach such as PipeDream or GPipe. It does so by only requiring the model parallelism framework to provide a model parallelism unit (mpu) that implements a few bookkeeping functionalities.

DeepSpeed is available for download on GitHub.

msai

More Information

DeepSpeed On GitHub

PyTorch Adds TorchScript API

PyTorch Scholarship Challenge

Microsoft Cognitive Toolkit Version 2.0

Microsoft Open Sources Natural Language Processing Tool

Microsoft Open Sources AI Debugging Tool

More AI Tools From Microsoft

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

TIOBE - Two To Rule Them All
16/07/2025

The July Tiobe index is out and it isn't particularly interesting until you notice that it confirms the standard model of programming - code is written in Python and C and everything else is jus [ ... ]

+ Full Story

For The Love Of Code
25/07/2025

GitHub has announced For the Love of Code, a summer hackathon for joyful, ridiculous, and wildly creative projects. The idea is that you take the mad ideas you've got sitting on the back burner a [ ... ]

+ Full Story

More News

Last Updated ( Thursday, 13 February 2020 )

Recent Articles

Recent Book Reviews

Popular Articles

More Information

Related Articles