Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Generative AI for Text to Audio Generation

SAIConference via YouTube

Overview

Explore cutting-edge advancements in generative AI for text-to-audio generation in this keynote presentation by Professor Wenwu Wang from the University of Surrey. Delve into the evolution of AI technology capable of producing soundscapes from simple text prompts, revolutionizing industries such as filmmaking, game design, virtual reality, and digital media. Learn about the progression from traditional methods to deep learning-based models like AudioLDM, AudioLDM2, Re-AudioLDM, and Wavjourney, and understand how these models map and align text with audio events to create complex audio environments. Discover real-world applications ranging from sound synthesis in gaming and movies to assisting the visually impaired. Gain insights into recent breakthroughs in cross-modal generation, key challenges, and future research directions. Experience live demonstrations and learn how to experiment with these tools on platforms like GitHub and Hugging Face. Key topics covered include an overview of deep generative AI for text-to-audio generation, introduction to key models, practical applications in sound design, and hands-on experimentation with open-source tools.

Syllabus

Generative AI for Text to Audio Generation | Wenwu Wang, University of Surrey | IntelliSys 2024

Taught by

SAIConference

Reviews

Start your review of Generative AI for Text to Audio Generation

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.