Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Florence-2: The Best Small Vision Language Model - Capabilities and Demo

Sam Witteveen via YouTube

Overview

Explore the capabilities of Florence-2, a new Vision Language Model (VLM) with a dataset of 5 billion labels, in this informative video. Learn about its architecture and various functionalities, including detailed image captioning, visual grounding, dense region captioning, and open vocabulary detection. Watch demonstrations of the model's performance using Hugging Face Spaces and examine sample usage in a Colab notebook. Gain insights into how Florence-2 combines traditional computer vision tasks with modern LLM-style captioning, potentially revolutionizing the field of visual AI.

Syllabus

Intro
Florence-2 Paper
Florence - 2 Architecture
Florence - 2 Detailed Image Captioning
Florence - 2 Visual Grounding
Florence - 2 Dense Region Caption
Florence - 2 Open Vocab Detection
Hugging Face Spaces Demo
Colab Florence - 2 Large Sample Usage

Taught by

Sam Witteveen

Reviews

Start your review of Florence-2: The Best Small Vision Language Model - Capabilities and Demo

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.