Overview
Syllabus
Intro
CUDA DEVELOPMENT ECOSYSTEM
POWERING THE DEEP LEARNING ECOSYSTEM
TESLA UNIVERSAL ACCELERATION PLATFORM
ACCELERATED COMPUTING IS FULL-STACK OPTIMIZATION
INTRODUCING CUDA 10,0
16 GPUS WITH 32GB MEMORY EACH
NVSWITCH: ALL-TO-ALL CONNECTIVITY
UNIFIED MEMORY + DGX-2
2X HIGHER PERFORMANCE WITH NVSWITCH
NEW PROGRAMMING MODEL FEATURES
ASYNCHRONOUS TASK GRAPHS
NEW EXECUTION MECHANISM
EXECUTION OPTIMIZATIONS
PERFORMANCE IMPACT
THE PATH TO FUSION ENERGY
VOLTA TENSOR CORE
NEW TURING TENSOR CORE
NEW TURING WARP MATRIX FUNCTIONS
CUTLASS 1.1
NVIDIA NGX: DL FOR CREATIVE APPLICATIONS
IN ADOBE PHOTOSHOP
CUDNN: GPU ACCELERATED DEEP LEARNING
IMPROVED HEURISTICS FOR CONVOLUTIONS
PERSISTENT RNN SPEEDUP ON V100
STRIDED ACTIVATION GRADIENTS
TENSORCORES WITH FP32 MODELS
MORE TENSORCORE PERFORMANCE IMPROVEMENTS
GENERAL PERFORMANCE IMPROVEMENTS
FUTURE UPDATES
Taught by
NVIDIA Developer