Overview
Syllabus
Intro
Example: Jacobi Iteration
Jacobi Iteration: C Code
Jacobi Iteration: OpenACC C Code
PGI Accelerator Compiler output (C)
What went wrong? • Set PGI_ACC_TIME environment variable to '1'
Offloading a Parallel Kernel
Separating Data from Computation
Excessive Data Transfers
Defining data regions
Data Clauses copy (diet) Allocates memory on GPU and copies data from host to GPU when entering region and coples data to the
Array Shaping
Jacobi Iteration: Data Directives
Execution Time (lower is better)
Further speedups
Calling MPI with OpenACC (Standard MPI)
OpenACC update Directive
OpenACC host_data Directive
Calling MPI with OpenACC (GPU-aware MPI)
C tip: the restrict keyword
Tips and Tricks (cont.)
Taught by
NVIDIA Developer