Explore a groundbreaking sequence-to-sequence model called the Imputer in this informative video. Discover how it achieves a balance between fully autoregressive and non-autoregressive models, offering constant decoding time regardless of sequence length through dynamic programming. Learn about its iterative generative approach, which requires only a fixed number of generation steps independent of input or output token count. Understand how the Imputer can be trained to approximately marginalize over all possible alignments between input and output sequences, as well as generation orders. Delve into the tractable dynamic programming training algorithm that provides a lower bound on the log marginal likelihood. Examine the Imputer's impressive performance in end-to-end speech recognition, outperforming prior non-autoregressive models and achieving competitive results compared to autoregressive models. Gain insights into its application on the LibriSpeech test-other dataset, where it achieves a Word Error Rate (WER) of 11.1, surpassing both CTC and seq2seq models.
Overview
Syllabus
Imputer: Sequence Modelling via Imputation and Dynamic Programming
Taught by
Yannic Kilcher