Tiny Text and Vision Models - Fine-Tuning and API Setup

Overview

Explore the intricacies of fine-tuning and deploying tiny text and vision models in this 44-minute tutorial. Dive into the architecture of multi-modal models, focusing on the Moondream model's components including its vision encoder (SigLIP), MLP (visionprojection), and language model (Phi). Learn how to apply LoRA adapters to multi-modal models and follow along with a hands-on fine-tuning notebook demo. Discover techniques for deploying custom APIs for multi-modal models, utilizing vLLM, and training models from scratch. Gain insights into multi-modal datasets and access a wealth of video resources to further your understanding of advanced vision and language processing techniques.

Syllabus

Fine-tuning tiny multi-modal models
Moondream server demo
Video Overview
Multi-modal model architecture
Moondream architecture
Moondream vision encoder SigLIP
Moondream MLP visionprojection
Moondream Language Model Phi
Applying LoRA adapters to a multi-modal model
Fine-tuning notebook demo
Deploying a custom API for multi-modal models
vLLM
Training a multi-modal model from scratch
Multi-modal datasets
Video resources

Taught by

Trelis Research

Reviews

Start your review of Tiny Text and Vision Models - Fine-Tuning and API Setup

Taught by

Fine-Tuning LLM Models - Generative AI Course

H2O ai Large Language Models (LLMs) - Level 2

Fine-tuning Multi-modal Video and Text Models

The Best Tiny Language Models - Performance, Fine-tuning, and Function-calling

LoRA Fine-tuning Explained - Choosing Parameters and Optimizations

Parameter Efficient Fine-Tuning with Multiple LoRA Adapters for Large Language Models

Never Stop Learning.