Training and Serving Custom Multi-modal Models - IDEFICS 2 and LLaVA Llama 3

Overview

Learn to train and serve custom multi-modal models using IDEFICS 2 and LLaVA Llama 3 in this comprehensive tutorial video. Explore the IDEFICS 2 model overview, model loading techniques, and LoRA setup. Evaluate OCR performance and handle multiple image inputs. Dive into the training and fine-tuning process, and review the LLaVA Llama 3 model. Set up a multi-modal inference endpoint and understand VRAM requirements for these advanced models. Discover why IDEFICS 2 is recommended as a foundation for building custom multi-modal applications. Access additional resources, including complete scripts, one-click fine-tuning templates, and community support to enhance your learning experience.

Syllabus

Fine-tuning and server setup for multi-modal models
Prerequisites pre-watching
IDEFICS 2 Model Overview
Model loading, evaluation and LoRA setup
Evaluating OCR performance
Evaluating multiple image inputs
Training / Fine-tuning
LLaVA Llama 3 Model Review
Multi-modal inference endpoint
VRAM Requirements for multi-modal models
IDEFICS 2 - my recommended model to build on

Taught by

Trelis Research

Reviews

Start your review of Training and Serving Custom Multi-modal Models - IDEFICS 2 and LLaVA Llama 3

From Zero to Cybersecurity Analyst

Most common

Popular subjects

Popular courses

Training and Serving Custom Multi-modal Models - IDEFICS 2 and LLaVA Llama 3

Overview

Syllabus

Taught by

Reviews

From Zero to Cybersecurity Analyst

Taught by

Fine-tuning Llama 2 for Tone or Style Using Shakespeare Dataset

Fine-Tuning Llama 3 on a Custom Dataset for RAG Q&A - Training LLM on a Single GPU

Fine-tuning Multi-modal Video and Text Models

Tiny Text and Vision Models - Fine-Tuning and API Setup

Fine-tuning Llama 3 on Wikipedia Datasets for Low-Resource Languages

Optimizing LLM Fine-Tuning with PEFT and LoRA Adapter-Tuning for GPU Performance

Never Stop Learning.