Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

LLMOps: Quantization Models and Inference with ONNX Generative Runtime

The Machine Learning Engineer via YouTube

Overview

Explore the world of LLMOps through a 30-minute video focusing on quantization models and inference using ONNX Generative Runtime. Learn how to install ONNX runtime with GPU support and perform inference with a generative model, specifically using a Phi3-mini-4k quantized to 4int. Dive into the process of converting an original Phi3-mini-128k into a 4int quantized version using the ONNX runtime. Access the accompanying notebook on GitHub to follow along and gain hands-on experience in this cutting-edge area of data science and machine learning.

Syllabus

LLMOps: Quantization models & Inference ONNX Generative Runtime #datascience #machinelearning

Taught by

The Machine Learning Engineer

Reviews

Start your review of LLMOps: Quantization Models and Inference with ONNX Generative Runtime

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.