Explore the development of multi-lingual end-to-end speech recognition systems through a 22-minute conference talk from SK AI SUMMIT 2024. Learn how traditional speech recognition models, which relied heavily on time-consuming and costly human-labeled supervised learning, are being transformed through self-supervised learning techniques. Discover the process of developing a speech foundation encoder using 500,000 hours of unlabeled multi-language data, and understand how to optimize the model through fine-tuning for specific languages and domains. Presented by Sunghwan Shin from SK Telecom, who specializes in acoustic modeling, wake-up systems, confidence measures, and attention-based encoder/decoder architectures, with recent focus on self-supervised learning and multi-lingual speech recognition model development.
Overview
Syllabus
한국어도, 영어도 문제없어요. Self-supervised Learning 기반 Multi-lingual E2E 음성인식 | SK텔레콤 신성환
Taught by
SK AI SUMMIT 2024