Program Deep Learning on the GPU with Triton
Written by Nikos Vaggalis   
Monday, 02 August 2021

Triton is a new, Python-like, language from OpenAI intended to ease programming for the GPU by providing an alternative to CUDA.

NVIDIA's CUDA toolkit provides a development environment for speeding up computing applications by harnessing the power of GPUs, but it requires the code to be written code in C++. As C++ by default is not user-friendly and difficult to master,these properties subsequently rub off on the toolkit itself.

Wouldn't it be easier to use a language that is user-friendly to write your GPU-accelerated deep learning applications in? Wish granted by Open AI which has announced:

We’re releasing Triton 1.0, an open-source Python-like programming language which enables researchers with no CUDA experience to write highly efficient GPU code—most of the time on par with what an expert would be able to produce.

At the moment Triton just supports NVIDIA Tensor Core GPUs such as A100. We had a brief encounter with the powerhouse A100 cards in "GAN Theft Auto - The Neural Network Is The Game", where Nvdia itself loaned to two researchers a DGX station beefed up computer sporting four of the 80 gigabyte A100 GPU cards for a total of 320 gigabytes of VRAM in order to accelerate their research.

But to further clarify the "alternative to the CUDA toolkit" claim, while Triton enables researchers with no CUDA experience to write GPU computing and deep learning applications without needing the CUDA toolkit, the GPUs they target must be CUDA-enabled. AMD support is not included in the project's short term plans. That said, Triton also requires those CUDA-enabled GPUs to posses a Compute Capability (a measurement of the GPU's general specifications and available features) of 7.0+. A good chart of the compatibility per GPU can be found here.

That covers the hardware part.

Programming wise, Triton is Python with extensions for GPU programming tasks like matrix multiplication or Vector Addition. As the online examples of the code show, you just import Triton as a library and use its functions on top of Python:

The language aside, Triton is also comprised of two other parts:


  • Triton-IR : An LLVM-based Intermediate Representation (IR) that provides an environment suitable for tile-level program analysis, transformation and optimization.
  • Triton-JIT : A Just-In-Time (JIT) compiler and code generation backend for compiling Triton-IR programs into efficient LLVM bitcode.


But in the end the burning question is, do you trade performance for programmer productivity by switching from the C++ based CUDA toolkit to Triton?

Apart from making GPU programming more accessible to everyone, benchmarks have already shown that Triton produces kernels that are up to two times more efficient than equivalent Torch implementations, achieving peak performance with around 25 lines of Python code. On the other hand, implementing something similar in CUDA would take a lot more effort and would even be likely to achieve lower performance.

So is having the best of both worlds a possibility? That's a Yes with Triton.

The project is open-source and available on GitHub.
There's also a great starting up tutorial on Twitch with Triton author Philippe Tillet from OpenAI. Just make sure you start watching from minute 5 onwards since before that there's a technical glitch which disables sound.


More Information 

Introducing Triton: Open-Source GPU Programming for Neural Networks



Related Articles

GAN Theft Auto - The Neural Network Is The Game

Yann LeCun’s Deep Learning Course Free From NYU

Tensorflow 2.0 In 7 Hours


To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.


Udacity's New Discovering Ethical AI Course

Udacity has just launched an hour-long course on Ethical AI. Intended for a wide audience across many industries, it introduces to basic concepts and terms needed to step into the world of Ethica [ ... ]

AWS Introduces A New JavaScript Runtime For Lambda

Amazon has announced the availability, albeit for experimental purposes, of a new JavaScript based runtime called Low Latency Runtime or LLRT for short, to bring JavaScript up to the performance throu [ ... ]

More News

raspberry pi books



or email your comment to:

Last Updated ( Monday, 02 August 2021 )