Completed
Applications
Class Central Classrooms beta
YouTube videos curated by Class Central.
Classroom Contents
Fine-tuning Multi-modal LLaVA Vision and Language Models
Automatically move to the next video in the Classroom when playback concludes
- 1 Fine-tuning Multi-modal Models
- 2 Overview
- 3 LLaVA vs ChatGPT
- 4 Applications
- 5 Multi-modal model architecture
- 6 Vision Encoder architecture
- 7 LLaVA 1.5 architecture
- 8 LLaVA 1.6 architecture
- 9 IDEFICS architecture
- 10 Data creation
- 11 Dataset creation
- 12 Fine-tuning
- 13 Inference and Evaluation
- 14 Data loading
- 15 LoRA setup
- 16 Recap so far
- 17 Training
- 18 Evaluation post-training
- 19 Technical clarifications
- 20 Summary