LLMOps: OpenVino Toolkit Quantization 4int LLama 3.2 3B and Inference on CPU

Overview

Learn how to convert the LLAMA3.2 3 Billion parameter model to OpenVino IR format and quantize it to 4-bit integer precision. Follow along as the process of model conversion and quantization is demonstrated step-by-step. Discover how to perform inference on a CPU using Chain of Thought (CoT) prompts with the optimized model. Access the accompanying Jupyter notebook for hands-on practice and deeper understanding of the LLMOps techniques covered in this 26-minute tutorial on data science and machine learning.

Syllabus

LLMOps: OpenVino Toolkit quantization 4int LLama3.2 3B, Inference CPU #datascience #machinelearning

Taught by

The Machine Learning Engineer

Reviews

Start your review of LLMOps: OpenVino Toolkit Quantization 4int LLama 3.2 3B and Inference on CPU

Taught by

Converting Qwen2-VL 2B Model to OpenVino IR Format for CPU Inference

LLMOps: Comparison of OpenVino, ONNX, TensorRT, and PyTorch Inference

Converting Alibaba Cloud Qwen2-VL Model to OpenVino IR Format - Spanish Tutorial

MLOps MLflow: Converting Florence2 to OpenVINO IR for CPU Inference and Logging

LLMOps: Comparación de Openvino, ONNX, TensorRT y Pytorch para Inferencia

LLMOps: Quantization Models and Inference with ONNX Generative Runtime

10 Best Machine Learning Courses for 2024: Scikit-learn, TensorFlow, and more

Never Stop Learning.