The new release of the CUDA Toolkit from nvidia is worth knowing about. It features significant speed increases for Fermi GPUs (GeForce 400/500). Matrix manipulation is up to 300% faster, the Fast Fourier Transform is faster at 2x to 10x and so is random number generation. The H.264 encode/decode library is also now included with the Toolkit. Debugging support has also been extended to multi-GPU setups in gdb and Parallel Nsight.
There are also some new SDK code samples:
- Several code samples demonstrating how to use the new CURAND library, including MonteCarloCURAND, EstimatePiInlineP, EstimatePiInlineQ, EstimatePiP, EstimatePiQ, SingleAsianOptionP, and randomFog
- Conjugate Gradient Solver, demonstrating the use of CUBLAS and CUSPARSE in the same application
- Function Pointers, a sample that shows how to use function pointers to implement the Sobel Edge Detection filter for 8-bit monochrome images
- Interval Computing, demonstrating the use of interval arithmetic operators using C++ templates and recursion
- Simple Printf, demonstrating best practices for using both printf and cuprintf in compute kernels
- Bilateral Filter, an edge-preserving non-linear smoothing filter for image recovery and denoising implemented in CUDA C with OpenGL rendering
- SLI with Direct3D Texture, a simple example demonstrating the use of SLI and Direct3D interoperability with CUDA C
- cudaEncode, showing how to use the NVIDIA H.264 Encoding Library using YUV frames as input
- Vflocking Direct3D/CUDA, which simulates and visualizes the flocking behavior of birds in flight
- simpleSurfaceWrite, demonstrating how CUDA kernels can write to 2D surfaces on Fermi GPUs
The CUDA Toolkit 3.2 is available to download for Windows, Mac OS X and Linux.
Thrust for CUDA
CUDA by Example
Parallel Nsight - another shot in the GPU war