Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Image Captioning Python App with ViT and GPT2 Using Hugging Face Models - Applied Deep Learning

1littlecoder via YouTube

Overview

Learn to create an image captioning Python application using Vision Transformer (ViT) and GPT-2 models from Hugging Face. Follow along as the tutorial guides you through building a Gradio app that generates descriptive captions for images. Explore the integration of Sachin's pre-trained model from the Hugging Face Model Hub, which combines ViT for image processing and GPT-2 for text generation. By the end of this 25-minute tutorial, deploy your own image captioning app on the Hugging Face Model Hub, gaining practical experience in applied deep learning and natural language processing.

Syllabus

Build Image Captioning Python App with ViT & GPT2 using Hugging Face Models | Applied Deep Learning

Taught by

1littlecoder

Reviews

Start your review of Image Captioning Python App with ViT and GPT2 Using Hugging Face Models - Applied Deep Learning

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.