Fine-tuning Multi-modal LLaVA Vision and Language Models

Fine-tuning Multi-modal LLaVA Vision and Language Models

Trelis Research via YouTube Direct link

Summary

20 of 20

20 of 20

Summary

Class Central Classrooms beta

YouTube videos curated by Class Central.

Classroom Contents

Fine-tuning Multi-modal LLaVA Vision and Language Models

Automatically move to the next video in the Classroom when playback concludes

  1. 1 Fine-tuning Multi-modal Models
  2. 2 Overview
  3. 3 LLaVA vs ChatGPT
  4. 4 Applications
  5. 5 Multi-modal model architecture
  6. 6 Vision Encoder architecture
  7. 7 LLaVA 1.5 architecture
  8. 8 LLaVA 1.6 architecture
  9. 9 IDEFICS architecture
  10. 10 Data creation
  11. 11 Dataset creation
  12. 12 Fine-tuning
  13. 13 Inference and Evaluation
  14. 14 Data loading
  15. 15 LoRA setup
  16. 16 Recap so far
  17. 17 Training
  18. 18 Evaluation post-training
  19. 19 Technical clarifications
  20. 20 Summary

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.