Overview
Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore cutting-edge advancements in generative AI for text-to-audio generation in this keynote presentation by Professor Wenwu Wang from the University of Surrey. Delve into the evolution of AI technology capable of producing soundscapes from simple text prompts, revolutionizing industries such as filmmaking, game design, virtual reality, and digital media. Learn about the progression from traditional methods to deep learning-based models like AudioLDM, AudioLDM2, Re-AudioLDM, and Wavjourney, and understand how these models map and align text with audio events to create complex audio environments. Discover real-world applications ranging from sound synthesis in gaming and movies to assisting the visually impaired. Gain insights into recent breakthroughs in cross-modal generation, key challenges, and future research directions. Experience live demonstrations and learn how to experiment with these tools on platforms like GitHub and Hugging Face. Key topics covered include an overview of deep generative AI for text-to-audio generation, introduction to key models, practical applications in sound design, and hands-on experimentation with open-source tools.
Syllabus
Generative AI for Text to Audio Generation | Wenwu Wang, University of Surrey | IntelliSys 2024
Taught by
SAIConference