MLOps: Comparing Microsoft Phi3 Mini 128k in GGUF, MLFlow, and ONNX Formats

Overview

Explore the Microsoft Phi3 Mini 128k model and compare inference performance across different formats and quantization methods in this 45-minute video tutorial. Learn how to work with MLFlow, GGUF, and ONNX formats while examining their impact on inference time and precision. Follow along with provided notebooks to implement MLFlow quantization with bfloat16, Llama.cpp quantization with float16 in GGUF format, ONNX CPU quantization with int4, and ONNX GPU DirectML quantization with int4. Gain insights into defining input and output parameters, managing artifacts, and flowing the model through various frameworks. Conclude with a comprehensive understanding of the performance differences between these approaches for deploying the Phi3 mini 128k model in machine learning and data science applications.

Syllabus

Intro
Phi3 mini 128k
Defining input and output parameters
Defining artifacts
Flowing the model
MLFlow notebook
MLFlow model
ONNX model
ONNX performance
DirectML
Microsoft ONNX
Conclusion

Taught by

The Machine Learning Engineer

Reviews

Start your review of MLOps: Comparing Microsoft Phi3 Mini 128k in GGUF, MLFlow, and ONNX Formats

Taught by

MLOps with MLFlow: Comparing Microsoft Phi3 Mini 128k in GGUF, MLFlow, and ONNX Formats

MLOps: Logging and Loading Microsoft Phi3 Mini 128k in GGUF with MLflow

MLOps: Saving and Loading Microsoft Phi3 Mini 128k in GGUF Format with MLflow

LLMOps: Quantization Models and Inference with ONNX Generative Runtime

MLOPS LLMs: Converting Microsoft Phi3 to GGUF Format with LLaMA.cpp

LLMOPs: Multimodal Prompting and Inference with Phi-3 Vision 128K Instruct on CPU - ONNX 4-Bit Quantization in C#

10 Best Data Science Courses

10 Best Machine Learning Courses for 2024: Scikit-learn, TensorFlow, and more

Never Stop Learning.