IDEFICS 2 API Endpoint, vLLM vs TGI, and General Fine-tuning Tips

Overview

Explore the latest developments in AI model deployment, fine-tuning techniques, and transformer architectures in this comprehensive video tutorial. Dive into deploying the IDEFICS multimodal model on TGI for inference, understanding the nuances of prompt engineering for optimal performance. Learn about representation fine-tuning, Lora, and QLoRA methods for model optimization. Gain insights into transformer architectures, building models from scratch, and working with quantized versions for efficient inference. Discover the potential of ORPO (preference fine-tuning + supervised fine-tuning) and explore strategies for training models using various datasets. Benefit from expert answers to common questions and get a preview of upcoming topics in the field of AI research and development.

Syllabus

Introduction
Latest video on representation fine-tuning and fine-tuning with Lora
Deploying IDEFICS multimodal model on TGI for inference
Answering questions from the chat
Transformer architectures and building models from scratch
Quantized model versions and inference
Lora vs QLoRA for fine-tuning
ORPO preference fine-tuning + supervised fine-tuning
Training models from scratch and datasets
Wrap-up and upcoming videos