Fine-tuning Multi-modal Video and Text Models

Fine-tuning Multi-modal Video and Text Models

Trelis Research via YouTube Direct link

"Video + Text" from "Image + Text" models

1 of 9

1 of 9

"Video + Text" from "Image + Text" models

Class Central Classrooms beta

YouTube videos curated by Class Central.

Classroom Contents

Fine-tuning Multi-modal Video and Text Models

Automatically move to the next video in the Classroom when playback concludes

  1. 1 "Video + Text" from "Image + Text" models
  2. 2 Clipping and Querying Videos with an IDEFICS 2 endpoint
  3. 3 Fine-tuning video + text models
  4. 4 Dataset generation for video fine-tuning + pushing to hub
  5. 5 Clipping and querying videos with image splitting in a Jupyter Notebook
  6. 6 Side-note - IDEFICS 2 vision to text adapter architecture
  7. 7 Video clip notebook evaluation - continued
  8. 8 Loading a video dataset for fine-tuning
  9. 9 Recap of video + text model fine-tuning

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.