OpenAI Whisper - Robust Speech Recognition via Large-Scale Weak Supervision
Aleksa Gordić - The AI Epiphany via YouTube
Overview
Syllabus
Intro
Paper overview
Collecting a large scale weakly supervised dataset
Evaluation metric issues WER
Effective robustness
Scaling laws in progress
Decoding is hacky
Code walk-through
Model architecture diagram vs code
Transcription task
Loading the audio, mel spectrograms
Language detection
Transcription task continued
Suppressing token logits
Voice activity detection
Decoding and heuristics
Outro
Taught by
Aleksa Gordić - The AI Epiphany