Microsoft Releases DeepSpeed For PyTorch
Written by Kay Ewbank   
Thursday, 13 February 2020

Microsoft Research has released an open source library that's compatible with PyTorch. DeepSpeed is a deep learning optimization library that makes it easier to work with large models for training, making it possible to train 100-billion-parameter models.

Microsoft says that the new library uses memory optimization technology to improve PyTorch model training, meaning researchers can use more parameters. The library makes better use of memory that is local to the GPU, and can be used with existing PyTorch applications with only minor changes to the app.


Advantages offered by DeepSpeed include distributed training, mixed precision, and checkpointing, through lightweight APIs that are compatible with PyTorch.

One part of the DeepSpeed library, ZeRO, is the parallelized optimizer that is responsible for the reduction in resource use. Microsoft says researchers have used these breakthroughs to create Turing Natural Language Generation (Turing-NLG), the largest publicly known language model at 17 billion parameters. The Zero Redundancy Optimizer (ZeRO) can train deep learning models with 100 billion parameters on the current generation of GPU clusters at three to five times the throughput of the current best system.

Other optimization techniques included in DeepSpeed include constant buffer optimization and smart gradient accumulation. Constant Buffer Optimization (CBO) enables high network and memory throughput while restricting memory usage to a constant size. The way this works is that for most memory- and network-bound operations, the performance depends on the size of the operand. CBO in DeepSpeed fuses smaller operands into a pre-defined sized buffer big enough to improve performance without unnecessary memory overhead.

The third optimization technique is Smart Gradient Accumulation. This can be used to run larger batch size with limited memory by breaking an effective batch into several sequential micro-batches, and averaging the parameter gradients across these micro-batches.

The researchers asy DeepSpeed supports all forms of model parallelism including tensor slicing based approaches such as the Megatron-LM, or a pipelined parallelism approach such as PipeDream or GPipe. It does so by only requiring the model parallelism framework to provide a model parallelism unit (mpu) that implements a few bookkeeping functionalities.

DeepSpeed is available for download on GitHub.



More Information

DeepSpeed On GitHub

Related Articles

PyTorch Adds TorchScript API

PyTorch Scholarship Challenge

Microsoft Cognitive Toolkit Version 2.0

Microsoft Open Sources Natural Language Processing Tool

Microsoft Open Sources AI Debugging Tool

More AI Tools From Microsoft

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.


MongoDB 6 Adds Encrypted Query Support

MongoDB 6 has been released with improvements aimed at letting developers innovate more rapidly. To achieve this the new release has integrated features into the core database platform including appli [ ... ]

Videos From Inaugural Computer History Conference

The First International Research Conference on the History of Computing, dubbed Computing's Woodstock, gathered together a global elite of computer pioneers. It took place in June 1976 and now the Com [ ... ]

More News

Last Updated ( Thursday, 13 February 2020 )