Overview
Syllabus
Intro
In this video
What are transformers and attention?
Attention explained simply
Attention used in CNNs
Transformers and attention
What vision transformer ViT does differently
Images to patch embeddings
1. Building image patches
2. Linear projection
3. Learnable class embedding
4. Adding positional embeddings
ViT implementation in python with Hugging Face
Packages, dataset, and Colab GPU
Initialize Hugging Face ViT Feature Extractor
Hugging Face Trainer setup
Training and CUDA device error
Evaluation and classification predictions with ViT
Final thoughts
Taught by
James Briggs