Overview
Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Learn to implement and train a Vision Transformer (ViT) model in this 52-minute technical video tutorial that focuses on real-time emotion classification from video data. Explore practical machine learning concepts through hands-on coding demonstrations, comparing ViT performance with CLIP for zero-shot classification tasks. Gain deep insights into applying state-of-the-art AI models through real-world implementation examples, bridging the gap between theoretical understanding and practical application. Master the technical aspects of working with transformer architectures in computer vision while building a functional emotion classification system that operates in real-time.
Syllabus
How to train a Vision Transformer (ViT) for real time image classification - Practical ML Dives
Taught by
Oxen