Microsoft Releases DeepSpeed For PyTorch
Written by Kay Ewbank   
Thursday, 13 February 2020

Microsoft Research has released an open source library that's compatible with PyTorch. DeepSpeed is a deep learning optimization library that makes it easier to work with large models for training, making it possible to train 100-billion-parameter models.

Microsoft says that the new library uses memory optimization technology to improve PyTorch model training, meaning researchers can use more parameters. The library makes better use of memory that is local to the GPU, and can be used with existing PyTorch applications with only minor changes to the app.

msai

Advantages offered by DeepSpeed include distributed training, mixed precision, and checkpointing, through lightweight APIs that are compatible with PyTorch.

One part of the DeepSpeed library, ZeRO, is the parallelized optimizer that is responsible for the reduction in resource use. Microsoft says researchers have used these breakthroughs to create Turing Natural Language Generation (Turing-NLG), the largest publicly known language model at 17 billion parameters. The Zero Redundancy Optimizer (ZeRO) can train deep learning models with 100 billion parameters on the current generation of GPU clusters at three to five times the throughput of the current best system.

Other optimization techniques included in DeepSpeed include constant buffer optimization and smart gradient accumulation. Constant Buffer Optimization (CBO) enables high network and memory throughput while restricting memory usage to a constant size. The way this works is that for most memory- and network-bound operations, the performance depends on the size of the operand. CBO in DeepSpeed fuses smaller operands into a pre-defined sized buffer big enough to improve performance without unnecessary memory overhead.

The third optimization technique is Smart Gradient Accumulation. This can be used to run larger batch size with limited memory by breaking an effective batch into several sequential micro-batches, and averaging the parameter gradients across these micro-batches.

The researchers asy DeepSpeed supports all forms of model parallelism including tensor slicing based approaches such as the Megatron-LM, or a pipelined parallelism approach such as PipeDream or GPipe. It does so by only requiring the model parallelism framework to provide a model parallelism unit (mpu) that implements a few bookkeeping functionalities.

DeepSpeed is available for download on GitHub.

msai 

 

More Information

DeepSpeed On GitHub

Related Articles

PyTorch Adds TorchScript API

PyTorch Scholarship Challenge

Microsoft Cognitive Toolkit Version 2.0

Microsoft Open Sources Natural Language Processing Tool

Microsoft Open Sources AI Debugging Tool

More AI Tools From Microsoft

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on, Twitter, Facebook or Linkedin.

Banner


ES2020 Finalized
08/04/2020

The eleventh version of ECMA Script has just been finalized and we can look forward to some fun things as the year progresses. So what's new?



Pi Day 2020 - A Meditation On Numbers
14/03/2020

So many Pi days, so many digits - and yet it is still mysterious and fascinating. I give you some deep thoughts on the irrational transcendental.


More News

Last Updated ( Thursday, 13 February 2020 )