LLMOps: OpenVino Toolkit para Quantizar LLama 3.2 3B a 4int e Inferencia en CPU

Overview

Explore a 32-minute video tutorial on LLMOps, focusing on using the OpenVino Toolkit to quantize the LLAMA3.2 3B model to 4-bit integer format and perform CPU inference. Learn how to convert the LLAMA3.2 3 billion parameter model to OpenVino IR format, apply 4-bit integer quantization, and execute inference on CPU using Chain of Thought (CoT) prompts. Access the accompanying Jupyter notebook for hands-on practice and in-depth understanding of the process. Ideal for data scientists and machine learning enthusiasts looking to optimize large language models for efficient deployment.

Syllabus

LLMOps: OpenVino Toolkit quantizar 4int LLama3.2 3B e Inferencia CPU #datascience #machinelearning

Taught by

The Machine Learning Engineer

Reviews

Start your review of LLMOps: OpenVino Toolkit para Quantizar LLama 3.2 3B a 4int e Inferencia en CPU

Taught by

Converting Qwen2-VL 2B Model to OpenVino IR Format for CPU Inference

Converting Alibaba Cloud Qwen2-VL Model to OpenVino IR Format - Spanish Tutorial

LLMOps: Comparación de Openvino, ONNX, TensorRT y Pytorch para Inferencia

MLOps MLflow: Converting Florence2 to OpenVINO IR for CPU Inference and Logging

LLMOps: Comparison of OpenVino, ONNX, TensorRT, and PyTorch Inference

LLMOPs: Inferencia en CPU con Phi3 Vision 128k Instruct - ONNX 4bits en C#

10 Best Machine Learning Courses for 2024: Scikit-learn, TensorFlow, and more

Never Stop Learning.