MLOps with MLFlow: Evaluating and Quantizing Fine-Tuned LLM Models

Overview

Learn how to evaluate Large Language Models (LLMs) after fine-tuning, convert them to ONNX format, and perform INT8 quantization using MLflow's evaluate class. Follow along with practical demonstrations using the MLflow library to assess model performance, optimize storage requirements, and improve inference speed. Access the complete implementation through a detailed Jupyter notebook that covers the entire workflow of evaluating a T5-large model for multi-news summarization tasks.