PyTorch Team Introduces Cluster Programming

Written by Kay Ewbank

Tuesday, 04 November 2025

The developers of PyTorch have introduced Monarch, a distributed programming framework that can be used to program distributed systems in the same way you’d program a single machine.

Normal PyTorch has an HPC-style multi-controller model, where multiple copies of the same script are launched across different machines, each running its own instance of the application. This hasn't been easily usable for machine learning workflows.

monarch

To provide a better model, the PyTorch team has created a framework to mimic the simplicity of single-machine PyTorch to entire clusters. Monarch provides a single controller programming model, in which a single script orchestrates all distributed resources, making them feel almost local. This simplifies distributed programming because code looks and feels like a single-machine Python program, but can scale across thousands of GPUs. It also means developers can can directly use Pythonic constructs such as classes, functions, loops, tasks, and futures to express complex distributed algorithms.

Monarch organizes hosts, processes, and actors into scalable meshes that can be manipulated directly. You can operate on entire meshes (or slices of them) with simple APIs, and Monarch handles the distribution and vectorization automatically.

It offers progressive fault handling, so when something does fail, Monarch stops the whole program, just like an uncaught exception in a simple local script. Developers can progressively add fine-grained fault handling exactly where it is needed, catching and recovering from failures just like you'd catch exceptions.

Monarch splits the control plane (messaging) from the data plane (RDMA transfers), enabling direct GPU-to-GPU memory transfers across your cluster. It lets you send commands through one path, while moving data through another, optimized for what each does best.

Monarch integrates with PyTorch to provide tensors that are sharded across clusters of GPUs. Monarch tensor operations look local but are executed across distributed large clusters, with Monarch handling the complexity of coordinating across thousands of GPUs.

There are two key APIs, for Process and Actor Meshes, alongside two more advanced APIs for the tensor engine and RDMA buffer.

Monarch organizes resources into multidimensional arrays, or meshes. A process mesh is an array of processes spread across many hosts; an actor mesh is an array of actors, each running inside a separate process. The launch version of Monarch supports process meshes over GPU clusters, typically one process per GPU, onto which you can spawn actors into actor meshes.

Monarch's tensor engine brings distributed tensors to process meshes. It lets you write PyTorch programs as if the entire cluster of GPUs were attached to the machine running the script. For bulk data movement, Monarch also provides an RDMA buffer API, enabling direct, high-throughput transfers between processes on supported NICs.

Monarch is available now on GitHub.

monarch

More Information

Monarch on PyTorch Website

Monarch On GitHub

PyTorch Adds New APIs

PyTorch Scholarship Challenge

PyTorch Adds TorchScript API

PyTorch 1.5 Updates C++ API

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

Linus - Garbage Code And AI Code
15/10/2025

Linus Torvalds seems to have had a bit of a relapse lately in his efforts to moderate his comments on code. One of his latest outbursts merits more than a surface analysis.

+ Full Story

DH2i Launches DxEnterprise For SQL Server 2025
21/10/2025

DH2i has released DxEnterprise for SQL Server 2025 which brings mission-critical high availability capability for SQL Server 2025-backed AI applications.

+ Full Story

More News

Comments

or email your comment to: comments@i-programmer.info

Last Updated ( Tuesday, 04 November 2025 )

More Information

Related Articles

Comments