Making LLMs from Transformers - BERT and Encoder-based Models - Part 1

Overview

Learn how to construct Large Language Models using the encoder component of Transformers in this 19-minute technical video. Explore the architectural distinctions between encoder and decoder components, and discover how Google developed BERT through pre-training and fine-tuning processes. Follow along with detailed explanations of BERT's pre-training methods including Masked Language Modeling (MLM) and Next Sentence Prediction (NSP), as well as various fine-tuning approaches and weight update strategies. Examine BERT-based model variations like RoBERTa, DistilBERT, and AlBERT, while gaining insights into important LLM benchmarking tools such as GLUE, SQuAD, and MMLU. Download accompanying mindmaps and reference materials to enhance understanding of these fundamental concepts in modern language model development.

Syllabus

- - Transformers Recap
- - Encoder vs Decoder
- - Pre-Training and Fine-tuning
- - BERT Pre-Training MLM, NSP
- - BERT Fine-tuning
- - Weight Update Strategies for Fine-tuning
- - BERT-based Models RoBERTa, DilstilBERT, AlBERT
- - LLM Benchmarks GLUE, SQuAD, MMLU, ...
- - Summary