Deploying and Scaling Large Language Models with NVIDIA NIM on Amazon EKS

Overview

Learn to deploy and scale large language models like Llama3/Mistral7b on Kubernetes through this 19-minute technical video demonstrating NVIDIA Inference Microservices (NIM) implementation on Amazon EKS. Master essential skills including GPU-enabled EKS cluster configuration, Kubernetes scaling strategies, and efficient deployment using NVIDIA's NIM Helm chart. Explore real-time benchmarking techniques with GenAIPerf while gaining practical insights into monitoring costs and performance metrics. Designed for ML engineers and cloud architects, gain hands-on experience through a live demonstration showcasing best practices for cost-effective LLM deployment in production environments on AWS infrastructure.