This repository includes code written for the course CS179 - GPU programming.
The concepts taught in this course are:
- GPU hardware and its abstractions (e.g. with grid, blocks, threads, and warps).
- GPU memory (e.g. registers, shared mem, local mem, global mem, L1/L2/L3 caches, etc.) and their characteristics.
- Best practices w.r.t. memory optimization (e.g. memory calescing, bank conflicts, register spilling).
- Thread divergence and latency hiding via occupancy/thread-level parallelism (TLP) and instruction-level parallelism (ILP), and streaming parallelism.
- Introduction to CUDA libraries such as cuBLAS, cuFFT, and cuDNN.
Code written for this course includes:
- Lab 1 - Small kernel convolution.
- Lab 2 - Matrix transposing. (concepts: memory coalescing, avoiding bank conflicts, ILP, atomic operations).
- Lab 3 - Reduction and FFT. (concepts: writing a reduction algorithm, use of cuBLAS and cuFFT).
- Lab 5 - Convolutional Neural Networks. (concepts: writing a CNN for MNIST handwritten digit classication, use of cuBLAS and cuDNN).
Additional code:
- Tiled matrix multiplication - A matrix multiplication implelmentation using tiles to increase memory reuse.