How Do Transformers Work? - A Deep Dive into Neural Network Architecture

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!

Grab it

Learn about the fundamental mechanisms behind transformer architectures in this technical lecture from MIT professor Ankur Moitra, delivered as part of the Simons Institute's Special Year on Large Language Models and Transformers Boot Camp. Dive deep into the inner workings of transformer models, exploring their key components, architectural design principles, and the mathematical foundations that make them so effective for natural language processing tasks. Gain valuable insights into attention mechanisms, positional encodings, and the overall structure that has made transformers the backbone of modern language models.