Audio-Visual Active Speaker Detection on Embedded Devices - Optimizing Deep Learning Models

Overview

Watch a technical conference presentation exploring the development and optimization of Active Speaker Detection (ASD) models for embedded devices. Learn how researchers at NXP Semiconductors created computationally efficient deep learning architectures that can identify active speakers in video by analyzing both visual and audio features in real-time. Discover the innovative approaches used to drastically reduce computational costs through multi-objective optimization and a novel modality fusion scheme, enabling implementation on both high-end MPUs and resource-constrained MCUs. Follow the complete optimization journey from model design modifications to quantization and integration on NXP devices, with detailed analysis of the trade-offs between computational efficiency and system accuracy.

Syllabus

tinyML EMEA - Baptiste Pouthier: Audio-Visual Active Speaker Detection on Embedded Devices

Taught by

EDGE AI FOUNDATION

Reviews

Start your review of Audio-Visual Active Speaker Detection on Embedded Devices - Optimizing Deep Learning Models

Taught by

10 Best Machine Learning Courses for 2024: Scikit-learn, TensorFlow, and more

Never Stop Learning.