Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!

Grab it

Explore a comprehensive lecture on Diffusion Forcing, a novel training paradigm for sequence generative modeling. Delve into the concept of training a diffusion model to denoise tokens with independent per-token noise levels, combining the strengths of next-token prediction and full-sequence diffusion models. Learn about the method's ability to generate variable-length sequences, guide sampling towards desirable trajectories, and roll out continuous token sequences beyond the training horizon. Discover new sampling and guiding schemes unique to Diffusion Forcing's architecture, leading to improved performance in decision-making and planning tasks. Gain insights into the theoretical foundations of the approach, including its optimization of a variational lower bound on subsequence likelihoods. The lecture covers background information, the core principles of Diffusion Forcing, its application with causal uncertainty, and concludes with a Q&A session.