What Can Storage Do for AI? - Optimizing NVMe Storage for Large Language Models

Overview

Explore a 32-minute conference talk from SNIA Storage Developer Conference 2024 examining the crucial role of flash storage and PCIe/NVMe in supporting AI applications across different scales. Dive into how NVMe storage can enhance both training and inference deployments, from large data centers to edge devices. Learn about the specific requirements for enabling NVMe offload in generative AI models through practical examples using the Microsoft Deep Speed library. Understand the optimization techniques and improvements needed in NVMe storage to achieve better LLM inference metrics. Presented by industry experts from Micron Technology, gain insights into democratizing AI training and inference at scale, the technical requirements for NVMe offload of LLMs, and opportunities for enhancing LLM inference performance through NVMe flash storage solutions.