Speech Production Features for Deep Neural Network Acoustic Modeling - 2015

Overview

Explore deep neural network acoustic modeling techniques incorporating speech production knowledge in this 55-minute lecture by Leonardo Badino from the Center for Language & Speech Processing at Johns Hopkins University. Delve into two approaches: using vocal tract movement measurements to extract new acoustic features, and deriving continuous-valued speech production knowledge features from binary phonological features to build structured DNN outputs. Examine the results of these methods tested on mngu0 and TIMIT datasets, showing consistent phone recognition error reduction. Learn about the speaker's background in speech technology and current research focus on speech production knowledge for automatic speech recognition, limited resources ASR, and computational analysis of non-verbal sensorimotor communication.

Syllabus

Intro
Speech production knowledge (SPK) for
Speech production knowledge for
An Example of the potential utility of using a measured Articulatory (+Acoustic) Domain
Motivations from neurophysiology of speech perception
Properties of the Acoustic-to-Articulatory Mapping
Autoencoders for AF transformation
Limitations of the approach
Alternative Approaches
Using Phonological Embeddings
Results - 2
Speaker-dependent phone recognition results-2 Mocha-timit, different noise conditions. Training on clean speech

Taught by

Center for Language & Speech Processing(CLSP), JHU

Reviews

Start your review of Speech Production Features for Deep Neural Network Acoustic Modeling - 2015

Taught by

Never Stop Learning.