Deep Generative Models for Speech and Images

Overview

Explore deep generative models for speech and image processing in this 42-minute lecture by Yoshua Bengio from the University of Montreal. Delve into the foundations of deep learning, examining its roots in connectionism and its power in transferring knowledge to computers. Understand how distributed representations capture data meaning and how learned functions are composed of simpler operations. Discover the significance of learning multiple levels of abstraction, particularly through unsupervised methods. Investigate deep unsupervised generative models and their application in end-to-end audio synthesis. Analyze quantitative results to gain insights into the effectiveness of these approaches in processing speech and images.

Syllabus

Deep Generative Models for Sounds and Images
What Deep Learning Owes to Connectionism • Leaming powerful way to transfer knowledge to computers Distributed (possibly sparse) representations, learned from data, capture the meaning of the data and state • Learned function seen as a composition of simpler operations
Learning Multiple Levels of Abstraction The big payoff of deep learning is to allow learning higher levels of abstraction, and most of it must happen in an unsupervised way for humans
Deep Unsupervised Generative Models
End-to-End Audio Synthesis with DL
Quantitative Results