Self-supervised Learning Based Multi-lingual E2E Speech Recognition

Overview

Explore the development of multi-lingual end-to-end speech recognition systems through a 22-minute conference talk from SK AI SUMMIT 2024. Learn how traditional speech recognition models, which relied heavily on time-consuming and costly human-labeled supervised learning, are being transformed through self-supervised learning techniques. Discover the process of developing a speech foundation encoder using 500,000 hours of unlabeled multi-language data, and understand how to optimize the model through fine-tuning for specific languages and domains. Presented by Sunghwan Shin from SK Telecom, who specializes in acoustic modeling, wake-up systems, confidence measures, and attention-based encoder/decoder architectures, with recent focus on self-supervised learning and multi-lingual speech recognition model development.

Syllabus

한국어도, 영어도 문제없어요. Self-supervised Learning 기반 Multi-lingual E2E 음성인식 | SK텔레콤 신성환

Taught by

SK AI SUMMIT 2024

Reviews

Start your review of Self-supervised Learning Based Multi-lingual E2E Speech Recognition

Taught by

Extracting Speaker and Emotion Information from Self-Supervised Speech Models

Open AI's Whisper Is Amazing

10 Best Machine Learning Courses for 2024: Scikit-learn, TensorFlow, and more

Never Stop Learning.