Meta Builds AI Supercomputer
Written by Lucy Black   
Thursday, 27 January 2022

Meta, formerly known as Facebook, has announced that its researchers have designed and built an AI Research SuperCluster (RSC) that they believe is among the fastest AI supercomputers running today and will be the fastest AI supercomputer in the world when, in mid-2022, it’s fully built.

metaaisuperc

Announcing the new supercomputers, Kevin Lee, Technical Program Manager, and Shubho Sengupta, Software Engineer at Meta, said that Meta researchers have already started using RSC to train large models in natural language processing (NLP) and computer vision for research, with the aim of one day training models with trillions of parameters.

The need for the supercomputer is driven by the creation of increasingly large, complex, and adaptable models that are being trained in areas including vision, speech, language, or for critical use cases like identifying harmful content.

Like other AI supercomputers, the Meta machine has been built by combining multiple GPUs into compute nodes, which are then connected by a high-performance network fabric to allow fast communication between those GPUs. RSC today comprises a total of 760 NVIDIA DGX A100 systems as its compute nodes, for a total of 6,080 GPUs. RSC’s storage tier has 175 petabytes of Pure Storage FlashArray, 46 petabytes of cache storage in Penguin Computing Altus systems, and 10 petabytes of Pure Storage FlashBlade.

metaaisuperc2

The researchers say that early benchmarks on RSC, compared with Meta’s legacy production and research infrastructure, show it runs computer vision workflows up to 20 times faster, runs the NVIDIA Collective Communication Library (NCCL) more than nine times faster, and trains large-scale NLP models three times faster. That means a model with tens of billions of parameters can finish training in three weeks, compared with nine weeks before.

One question raised by the need for data to train such a system is that models have to be taught using real-world data from Meta's production systems. This raises questions on privacy and security, which the researchers say is handled by RSC being isolated from the larger internet, with no direct inbound or outbound connections, and traffic can flow only from Meta’s production data centers.

They say:

"To meet our privacy and security requirements, the entire data path from our storage systems to the GPUs is end-to-end encrypted"

The data is also anonymized, and only decrypted at one endpoint.

meta

More Information

Meta AI Blog

Related Articles

AWS And Facebook Launch PyTorch Tools

Facebook Releases Detectron2

Facebook Open Sources Natural Language Processing Model

Facebook Open Sources Two Technologies

RocksDB - Facebook's Database Now Open Source 

 

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

Banner


Eclipse JKube 1.16 Goes GA
08/04/2024

Eclipse JKube makes deploying your Java application to a Kubernetes cluster a breeze. Let's find out what's new.



Supersimple - Deep Insights From Data
02/04/2024

Announcing $2.2 Million in pre-seed funding, the Estonian startup Supersimple has launched an AI-native data analytics platform which combines a semantic data modeling layer with the ability to answer [ ... ]


More News

raspberry pi books

 

Comments




or email your comment to: comments@i-programmer.info

Last Updated ( Thursday, 27 January 2022 )