Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Squeezing the Hardware to Make Performance Juice

NDC Conferences via YouTube

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore advanced hardware optimization techniques for .NET developers in this conference talk. Dive deep into modern processor architecture, vectorization, and memory systems to achieve top-notch performance. Learn about Intel's processor organization, branch prediction, instruction-level parallelism, and vector units. Discover how to leverage vectorization for matrix multiplication, vector normalization, and N-body simulations. Examine cache structures and techniques like data access reordering and tiling to overcome memory bottlenecks. Gain insights into optimizing CPU- and memory-bound algorithms used in real-world applications such as finance, image processing, and signal processing, all while using C# and .NET.

Syllabus

Intro
A Modern Processor
Processor Organization: Intel Nehalem+
Skylake Backend (sustained 4 uops/cycle)
Intel Top-Down Approach
Processor Behavior: Branch Prediction
Processor Behavior: ILP
Processor Behavior: Vector Units
Processor Behavior: Parallelization
Categories of Vector Instructions
Instruction Performance
Vectorization: Matrix Multiplication
Vectorization: Vector Normalization
Vectorization: AoS vs. SOA
Vectorization: N-Body Simulation
Vectorized strstr Illustrated
Vectorization: strstr
Vectorization: Sorted Set Intersection
Cache Structure, typical i5 (no L4 EDRAM)
Cache: Data Access Reordering
Cache: Tiling
Memory Bottleneck

Taught by

NDC Conferences

Reviews

Start your review of Squeezing the Hardware to Make Performance Juice

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.