Overview
Dive into a comprehensive 48-minute video lecture on the foundational concepts of large language models. Explore core machine learning principles, the Transformer architecture, and notable LLMs. Gain insights into pretraining dataset composition, including the importance of code in training data. Learn about key components such as input embedding, masked multi-head attention, positional encoding, and feed-forward layers. Discover why Transformers work so well and examine notable models like BERT, T5, GPT, Chinchilla, LLaMA, and RETRO. Understand the significance of scaling laws and instruction tuning in LLM development. Access accompanying slides and additional resources for a deeper understanding of this rapidly evolving field.
Syllabus
Intro
Foundations of Machine Learning
The Transformer Architecture
Transformer Decoder Overview
Inputs
Input Embedding
Masked Multi-Head Attention
Positional Encoding
Skip Connections and Layer Norm
Feed-forward Layer
Transformer hyperparameters and Why they work so well
Notable LLM: BERT
Notable LLM: T5
Notable LLM: GPT
Notable LLM: Chinchilla and Scaling Laws
Notable LLM: LLaMA
Why include code in LLM training data?
Instruction Tuning
Notable LLM: RETRO
Taught by
The Full Stack