Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

One Model for All the Tasks - BLIP - Author Interview

Yannic Kilcher via YouTube

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore an in-depth interview with Junnan Li and Dongxu Li, authors of BLIP and members of Salesforce research, discussing their innovative approach to vision-language pre-training. Delve into the development of BLIP, a versatile model that unifies various tasks and objectives in a single pre-training run. Learn about the data bootstrapping technique used to improve dataset quality, the evolution of the BLIP architecture, and its performance on various vision-language tasks. Gain insights into the challenges faced during research, the potential for modular pre-training, and the future directions of this technology. Discover how BLIP addresses issues in cross-modal pre-training and achieves state-of-the-art results in image-text retrieval, image captioning, and visual question answering.

Syllabus

- Intro
- Sponsor: Assembly AI
- Start of Interview
- What's the pitch?
- How did data bootstrapping come into the project?
- How big of a problem is data quality?
- Are the captioning & filtering models biased towards COCO data?
- Could the data bootstrapping be done multiple times?
- What was the evolution of the BLIP architecture?
- Are there additional benefits to adding language modelling?
- Can we imagine a modular future for pre-training?
- Diving into the experimental results
- What did and did not work out during the research?
- How is research life at Salesforce?
- Where do we go from here?

Taught by

Yannic Kilcher

Reviews

Start your review of One Model for All the Tasks - BLIP - Author Interview

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.