NVIDIA CUDA Dive Using Python

Written by Nikos Vaggalis

Thursday, 15 May 2025

NVIDIA adds native support to CUDA for Python, making it more accessible to developers at large.

CUDA is, of course, NVIDIA's toolkit and programming model which provides a development environment for speeding up computing applications by harnessing the power of GPUs. It's not easy to conquer since it requires the code to be written code in C++ and as C++ by default is not user-friendly and difficult to master, these properties subsequently rub off on the toolkit itself.

Back in 2021, we looked at an alternative to accessing CUDA using the most user friendly language there is - Python.
This was Triton, an open-source, Python-like programming language which enables researchers with no CUDA experience to write highly efficient GPU code, most of the time on par with what an expert would be able to produce. And surprise, surprise Triton is developed by OpenAI.

CuPy, the open-source array library for GPU-accelerated computing with Python, was another option too. But the time has come for NVIDIA to realize that having Python as a first class citizen is very beneficial for the toolkit's adoption by both developers and other communities such as scientists. Adoption is one thing; the other is that with the rise of AI, GPU accessible programming is on demand and NVIDIA wants everybody working on its chips.

As such, the emergence of Pythonic CUDA-accessing NVIDIA’s CUDA platform from Python. While the project existed for a couple of years, it is now in version 12.9 that gets really usable. Native support means brand new APIs and components:

cuda.core: Pythonic access to CUDA runtime and other core functionalities
cuda.bindings: Low-level Python bindings to CUDA C APIs
cuda.cooperative: A Python package providing CCCL’s reusable block-wide and warp-wide device primitives for use within Numba CUDA kernels
cuda.parallel: A Python package for easy access to CCCL’s highly efficient and customizable parallel algorithms, like sort, scan, reduce, transform, etc, that are callable on the host
numba.cuda: Numba’s target for CUDA GPU programming by directly compiling a restricted subset of Python code into CUDA kernels and device functions following the CUDA execution model.
nvmath-python for access to NVIDIA CPU & GPU Math Libraries.

A simple example of the new core API in action that enumerates the device's properties in Python code, follows:

As you see the API is experimental, but upon stabilization it will be moved out of the experimental namespace.

cuda.core supports Python 3.9 - 3.13, on Linux (x86-64, arm64) and Windows (x86-64) and of course to run CUDA Python, you’ll need the CUDA Toolkit installed on a system with CUDA-capable GPUs. If you don’t have a CUDA-capable GPU, you can access one from cloud service providers, like Amazon AWS and Microsoft Azure.

With that said, just use Python for everything, even on the GPU.

More Information

CUDA Python

Understanding GPU Architecture With Cornell

Program Deep Learning on the GPU with Triton

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

PostgreSQL 18 Released - What's New?
13/10/2025

PostgreSQL 18 was released on September 25, boosting a
many great features. If you check out the official release statement you'll find that there's a lot to digest, so we'll focus on just a [ ... ]

+ Full Story

What Does JetBrains Survey Tell Us About AI
29/10/2025

The results of the 2025 JetBrains Developer Survey are out and indicate just how deeply AI tools have become embedded into software development. However, while 85% use AI tools for coding and de [ ... ]

+ Full Story

More News

Comments

or email your comment to: comments@i-programmer.info

More Information

Related Articles

Comments