Empower Large Language Models Serving in Production with Cloud Native AI Technologies

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!

Grab it

Explore the challenges and solutions for deploying Large Language Models (LLMs) in production environments using cloud native AI technologies. Learn how KServe has been extended to handle OpenAI's streaming requests, accommodating the inference load of LLMs. Discover how Fluid and Vineyard have optimized model loading times, reducing Llama-30B's loading from 10 minutes to under 25 seconds. Understand the importance of cronHPA for timed auto-scaling to balance cost and performance. Gain insights from KServe and Fluid reviewers and maintainers on overcoming production challenges, and learn effective strategies for utilizing cloud native AI in real-world scenarios.

Syllabus

Empower Large Language Models (LLMs) Serving in Production with Cloud Native... Lize Cai & Yang Che

Taught by

Linux Foundation

Reviews

Start your review of Empower Large Language Models Serving in Production with Cloud Native AI Technologies

Taught by

Tags

Generative AI with Large Language Models

Generative AI and LLMs on AWS

Unlocking the Potential of Large Models in Production - Best Practices and Solutions

Running Open Large Language Models in Production with Ollama and Serverless GPUs

Engineering Cloud Native AI Platform

From Zero to Infinity: How AI-Powered Hedge Funds Build Cloud-Native AI Platforms on Kubernetes

Never Stop Learning.