Overview
Syllabus
Welcome!.
Loop vectorization and LoopVectorization.jl.
Current limitations of LoopVectorization.jl.
First main part of intra-core parallelism: Single Instruction Multiple Data (SIMD).
Loading and storing vectors.
Second main part of intra-core parallelism: super scalar parallelism.
Example: summing a vector.
Problem: not all vectors have a length that is multiple of 32.
Vectorization of the loop with @avx.
@avx and functions like log from stdlib.
@avx and StructArrays.jl.
Eliminating redundant operations.
LoopVectorization.jl and generated functions.
Redundancy in convolutions.
Internal working of LoopVectorization.jl.
Taught by
The Julia Programming Language