This workshop teaches you the fundamental tools and techniques for running GPU-accelerated Python applications using CUDA® GPUs and the Numba compiler.
Learning Objectives
- GPU-accelerate NumPy ufuncs with a few lines of code.
- Configure code parallelization using the CUDA thread hierarchy.
- Write custom CUDA device kernels for maximum performance and flexibility.
- Use memory coalescing and on-device shared memory to increase CUDA kernel bandwidth.
Download workshop datasheet (PDF 67 KB)