Decoder-Only Transformers, ChatGPT's Specific Transformer, Clearly Explained

Overview

Dive into a comprehensive 37-minute video tutorial exploring Decoder-Only Transformers, the specific type of Transformer used in ChatGPT. Learn about word embedding, position encoding, masked self-attention as an autoregressive method, and residual connections. Understand the process of generating the next word in a prompt, encoding and generating prompts, and the two-part output generation process. Compare Normal Transformers with Decoder-Only Transformers, and gain insights into the inner workings of cutting-edge AI technology. Supplementary resources for deeper understanding of related concepts like backpropagation, SoftMax function, and word embedding are also provided.

Syllabus

Transformers are taking over AI right now, and quite possibly their most famous use is in ChatGPT. ChatGPT uses a specific type of Transformer called a Decoder-Only Transformer, and this StatQuest shows you how they work, one step at a time. And at the end at , we talk about the differences between a Normal Transformer and a Decoder-Only Transformer. BAM!
Awesome song and introduction
Word Embedding
Position Encoding
Masked Self-Attention, an Autoregressive method
Residual Connections
Generating the next word in the prompt
Review of encoding and generating the prompt
Generating the output, Part 1
Masked Self-Attention while generating the output
Generating the output, Part 2
Normal Transformers vs Decoder-Only Transformers