Overview
Syllabus
Intro
A Modern Processor
Processor Organization: Intel Nehalem+
Skylake Backend (sustained 4 uops/cycle)
Intel Top-Down Approach
Processor Behavior: Branch Prediction
Processor Behavior: ILP
Processor Behavior: Vector Units
Processor Behavior: Parallelization
Categories of Vector Instructions
Instruction Performance
Vectorization: Matrix Multiplication
Vectorization: Vector Normalization
Vectorization: AoS vs. SOA
Vectorization: N-Body Simulation
Vectorized strstr Illustrated
Vectorization: strstr
Vectorization: Sorted Set Intersection
Cache Structure, typical i5 (no L4 EDRAM)
Cache: Data Access Reordering
Cache: Tiling
Memory Bottleneck
Taught by
NDC Conferences