Music Generation via Masked Acoustic Token Modeling

Overview

Explore the cutting-edge advancements in music audio synthesis through a 58-minute lecture by Bryan Pardo from Northwestern University. Delve into the innovative combination of parallel iterative decoding and acoustic token modeling, marking a significant milestone in neural audio music generation. Discover how this approach enables faster inference compared to autoregressive methods and its suitability for tasks like infill. Learn about the model's versatile applications through token-based prompting, including the ability to guide generation with selectively masked music token sequences. Examine the potential outcomes ranging from high-quality audio compression to creating variations of original music that maintain style, genre, beat, and instrumentation while introducing novel timbres and rhythms.