Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

LLMOps: LLMs Memory and Compute Optimizations

The Machine Learning Engineer via YouTube

Start learning Write review

Details

Start learning

Provider

YouTube
Pricing

Free Video
Languages

English
Duration & workload

24 minutes
Sessions

On-Demand
Level

Advanced

Found in

Overview

Explore FlashAttention and GQA techniques to enhance efficiency in self-attention layers, and discover FSDP and DDP methods for training and fine-tuning Large Language Models (LLMs) in this 24-minute tutorial. Gain practical insights into memory and compute optimizations for LLMs, with access to a comprehensive PowerPoint presentation and hands-on Jupyter notebook for implementation.