The new CUDA Toolkit makes matrix operations, FFT and random number generation significantly faster.
The new release of the CUDA Toolkit from nvidia is worth knowing about. It features significant speed increases for Fermi GPUs (GeForce 400/500). Matrix manipulation is up to 300% faster, the Fast Fourier Transform is faster at 2x to 10x and so is random number generation. The H.264 encode/decode library is also now included with the Toolkit. Debugging support has also been extended to multi-GPU setups in gdb and Parallel Nsight.
There are also some new SDK code samples:
- Several code samples demonstrating how to use the new CURAND library, including MonteCarloCURAND, EstimatePiInlineP, EstimatePiInlineQ, EstimatePiP, EstimatePiQ, SingleAsianOptionP, and randomFog
- Conjugate Gradient Solver, demonstrating the use of CUBLAS and CUSPARSE in the same application
- Function Pointers, a sample that shows how to use function pointers to implement the Sobel Edge Detection filter for 8-bit monochrome images
- Interval Computing, demonstrating the use of interval arithmetic operators using C++ templates and recursion
- Simple Printf, demonstrating best practices for using both printf and cuprintf in compute kernels
- Bilateral Filter, an edge-preserving non-linear smoothing filter for image recovery and denoising implemented in CUDA C with OpenGL rendering
- SLI with Direct3D Texture, a simple example demonstrating the use of SLI and Direct3D interoperability with CUDA C
- cudaEncode, showing how to use the NVIDIA H.264 Encoding Library using YUV frames as input
- Vflocking Direct3D/CUDA, which simulates and visualizes the flocking behavior of birds in flight
- simpleSurfaceWrite, demonstrating how CUDA kernels can write to 2D surfaces on Fermi GPUs
The CUDA Toolkit 3.2 is available to download for Windows, Mac OS X and Linux.
Thrust for CUDA
CUDA by Example
Parallel Nsight - another shot in the GPU war