Multimodal AI: Understanding Large Language Models with Vision and Audio Capabilities

Multimodal AI: Understanding Large Language Models with Vision and Audio Capabilities

Shaw Talebi via YouTube Direct link

Introduction -

1 of 7

1 of 7

Introduction -

Class Central Classrooms beta

YouTube videos curated by Class Central.

Classroom Contents

Multimodal AI: Understanding Large Language Models with Vision and Audio Capabilities

Automatically move to the next video in the Classroom when playback concludes

  1. 1 Introduction -
  2. 2 Multimodal LLMs -
  3. 3 Path 1: LLM + Tools -
  4. 4 Path 2: LLM + Adapaters -
  5. 5 Path 3: Unified Models -
  6. 6 Example: LLaMA 3.2 for Vision Tasks Ollama -
  7. 7 What's next? -

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.