Meta Builds AI Supercomputer
Written by Lucy Black   
Thursday, 27 January 2022

Meta, formerly known as Facebook, has announced that its researchers have designed and built an AI Research SuperCluster (RSC) that they believe is among the fastest AI supercomputers running today and will be the fastest AI supercomputer in the world when, in mid-2022, it’s fully built.


Announcing the new supercomputers, Kevin Lee, Technical Program Manager, and Shubho Sengupta, Software Engineer at Meta, said that Meta researchers have already started using RSC to train large models in natural language processing (NLP) and computer vision for research, with the aim of one day training models with trillions of parameters.

The need for the supercomputer is driven by the creation of increasingly large, complex, and adaptable models that are being trained in areas including vision, speech, language, or for critical use cases like identifying harmful content.

Like other AI supercomputers, the Meta machine has been built by combining multiple GPUs into compute nodes, which are then connected by a high-performance network fabric to allow fast communication between those GPUs. RSC today comprises a total of 760 NVIDIA DGX A100 systems as its compute nodes, for a total of 6,080 GPUs. RSC’s storage tier has 175 petabytes of Pure Storage FlashArray, 46 petabytes of cache storage in Penguin Computing Altus systems, and 10 petabytes of Pure Storage FlashBlade.


The researchers say that early benchmarks on RSC, compared with Meta’s legacy production and research infrastructure, show it runs computer vision workflows up to 20 times faster, runs the NVIDIA Collective Communication Library (NCCL) more than nine times faster, and trains large-scale NLP models three times faster. That means a model with tens of billions of parameters can finish training in three weeks, compared with nine weeks before.

One question raised by the need for data to train such a system is that models have to be taught using real-world data from Meta's production systems. This raises questions on privacy and security, which the researchers say is handled by RSC being isolated from the larger internet, with no direct inbound or outbound connections, and traffic can flow only from Meta’s production data centers.

They say:

"To meet our privacy and security requirements, the entire data path from our storage systems to the GPUs is end-to-end encrypted"

The data is also anonymized, and only decrypted at one endpoint.


More Information

Meta AI Blog

Related Articles

AWS And Facebook Launch PyTorch Tools

Facebook Releases Detectron2

Facebook Open Sources Natural Language Processing Model

Facebook Open Sources Two Technologies

RocksDB - Facebook's Database Now Open Source 


To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.


SQLite 3.44 Extends Aggregate Functions

SQLite 3.44 has been released with new C-language APIs and extensions to aggregate functions.

Google Indie Games Accelerator 2024

Google has announced that it has opened submissions for the Indie Games Accelerator 2024, a 10-week accelerator program for high potential indie game studios, designed to help programmers build a succ [ ... ]

More News




or email your comment to:

Last Updated ( Thursday, 27 January 2022 )