OpenAI CLIP- Connecting Text and Images

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!

Grab it

Explore a comprehensive video explanation of OpenAI's CLIP (Contrastive Language-Image Pre-training) model, which connects text and images. Delve into the paper "Learning Transferable Visual Models From Natural Language Supervision" and understand how CLIP trains on 400 million web-scraped images with text descriptions to create a versatile model. Learn about the contrastive objective, large batch size implementation, and how the resulting model can be adapted for zero-shot classification tasks. Examine the model's architecture, training process, performance comparisons, scaling properties, and robustness to data shift. Gain insights into the broader impact of this technology and its potential applications in various computer vision tasks.

Syllabus

- Introduction
- Overview
- Connecting Images & Text
- Building Zero-Shot Classifiers
- CLIP Contrastive Training Objective
- Encoder Choices
- Zero-Shot CLIP vs Linear ResNet-50
- Zero-Shot vs Few-Shot
- Scaling Properties
- Comparison on different tasks
- Robustness to Data Shift
- Broader Impact Section
- Conclusion & Comments

Taught by

Yannic Kilcher

Reviews

Start your review of OpenAI CLIP- Connecting Text and Images

Taught by

OpenAI CLIP - Connecting Text and Images - Paper Explained

Zero-Shot Image Classification with OpenAI's CLIP Model

OpenAI Whisper - Robust Speech Recognition via Large-Scale Weak Supervision

OpenAI DALL·E - Creating Images from Text - Blog Post Explained

100+ Free Online Courses and Webinars on Artificial Intelligence in Healthcare

AI for Everyone: 10 Best Free Artificial Intelligence Courses for 2024

Never Stop Learning.