Transformer Neural Networks, ChatGPT's Foundation, Clearly Explained
StatQuest with Josh Starmer via YouTube
Overview
Syllabus
Awesome song and introduction
Word Embedding
Positional Encoding
Self-Attention
Encoder and Decoder defined
Decoder Word Embedding
Decoder Positional Encoding
Transformers were designed for parallel computing
Decoder Self-Attention
Encoder-Decoder Attention
Decoding numbers into words
Decoding the second token
Extra stuff you can add to a Transformer
Taught by
StatQuest with Josh Starmer