Whisper Paper Explained - Robust Speech Recognition via Large-Scale Weak Supervision

Overview

Explore a comprehensive video walkthrough of the Whisper paper, detailing robust speech recognition via large-scale weak supervision. Delve into the groundbreaking research that achieved state-of-the-art results in speech recognition with open-source code and weights. Learn about the dataset collection process, model approach, experiments, and evaluation methods. Gain insights into long-form transcription challenges and the impact of model and dataset scaling on performance. Follow along as the presenter breaks down complex concepts, providing timestamps for easy navigation through key topics such as the abstract, introduction, model architecture, and experimental results.

Syllabus

- Introduction
- Abstract
- Introduction
- Dataset collection and processing
- Model approach
- Figure of model
- Experiments and Evaluation
- Long form transcription, messy :/
- Model and Dataset scaling
- Long form transcription cont, messy :/
- Ending