GPU Architectures and Programming

Overview

The course covers basics of conventional CPU architectures, their extensions for single instruction multiple data processing (SIMD) and finally the generalization of this concept in the form of single instruction multiple thread processing (SIMT) as is done in modern GPUs. We cover GPU architecture basics in terms of functional units and then dive into the popular CUDA programming model commonly used for GPU programming. In this context, architecture specific details like memory access coalescing, shared memory usage, GPU thread scheduling etc which primarily effect program performance are also covered in detail. We next switch to a different SIMD programming language called OpenCL which can be used for programming both CPUs and GPUs in a generic manner. Throughout the course we provide different architecture-aware optimization techniques relevant to both CUDA and OpenCL. Finally, we provide the students with detail application development examples in two well-known GPU computing scenarios.INTENDED AUDIENCE : Computer Science, Electronics, Electrical Engg students PREREQUISITES : Programming and Data Structure, Digital Logic, Computer architectureINDUSTRY SUPPORT : NVIDIA, AMD, Google, Amazon and most big-data companies

Syllabus

Week 1 :Review of Traditional Computer Architecture – Basic five stage RISC Pipeline, Cache Memory, Register File, SIMD instructions
Week 2 :GPU architectures - Streaming Multi Processors, Cache Hierarchy,The Graphics Pipeline
Week 3 :Introduction to CUDA programming
Week 4 :Multi-dimensional mapping of dataspace, Synchronization
Week 5 :Warp Scheduling, Divergence
Week 6 :Memory Access Coalescing
Week 7 :Optimization examples : optimizing Reduction Kernels
Week 8 :Optimization examples : Kernel Fusion, Thread and Block
Week 9 :OpenCL basics
Week 10:OpenCL for Heterogeneous Computing
Week 11-12 :Application Design : Efficient Neural Network Training/Inferencing