Overview
Explore the fascinating world of speech separation and noise removal using deep neural networks and TensorFlow in this 40-minute conference talk. Dive into the technical aspects of solving the cocktail party effect, where humans can focus on a single voice amidst multiple speakers and background noise. Learn about preparing and augmenting data for speech separation, creating and optimizing various neural network architectures, and running networks on tiny devices. Discover the potential for real-time speech separation on small embedded platforms, envisioning future smart air pods, headsets, and hearing aids. Gain insights into the latest advances and limitations in speech separation on embedded devices, including data transformation, deep neural network models, training smaller and faster networks, and creating real-time speech separation pipelines. The presentation covers topics such as mixed sounds, masking techniques, feature engineering, model parameters, and evaluation methods. Explore related use cases like "Looking to Listen" and speech-to-text applications, as well as the challenges and future directions in this exciting field of audio processing and machine learning.
Syllabus
Agenda
Speech Separation
Mixed Sounds
Solution
Approach
Mask
Transform
Typical deep learning tasks
Platform selection
Data set
Feature engineering
Models
Code
Model Parameters
Neural Network Generator
Evaluation Prediction
Future Directions
Related Use Cases
Looking to Listen
Speech to Text
Barriers
Pipelines
Datasets
Looping to Listen
Devices
Intensive Low Light
Summary
QA
Inference
Taught by
Databricks