Generative AI for Text to Audio Generation

Overview

Explore cutting-edge advancements in generative AI for text-to-audio generation in this keynote presentation by Professor Wenwu Wang from the University of Surrey. Delve into the evolution of AI technology capable of producing soundscapes from simple text prompts, revolutionizing industries such as filmmaking, game design, virtual reality, and digital media. Learn about the progression from traditional methods to deep learning-based models like AudioLDM, AudioLDM2, Re-AudioLDM, and Wavjourney, and understand how these models map and align text with audio events to create complex audio environments. Discover real-world applications ranging from sound synthesis in gaming and movies to assisting the visually impaired. Gain insights into recent breakthroughs in cross-modal generation, key challenges, and future research directions. Experience live demonstrations and learn how to experiment with these tools on platforms like GitHub and Hugging Face. Key topics covered include an overview of deep generative AI for text-to-audio generation, introduction to key models, practical applications in sound design, and hands-on experimentation with open-source tools.