Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Automatic Image Captioning with Vision Transformer and GPT-2

Eran Feit via YouTube

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Learn how to generate descriptive captions for images using Python and PyTorch in this 16-minute tutorial. Explore the process of automatic image captioning with the pre-trained 'nlpconnect/vit-gpt2-image-captioning' model from Hugging Face. Set up the Vision Transformer (ViT) for image processing and GPT-2 for text generation. Discover how to install the necessary environment and Python libraries, load pre-trained models, process images with Vision Transformers, generate text with GPT-2 in PyTorch, and display the captioning results alongside the images. Access the tutorial code and find additional computer vision resources through provided links. Gain practical skills in implementing state-of-the-art image captioning techniques using popular deep learning frameworks.

Syllabus

Automatic Image Captioning with Vit-Gpt2

Taught by

Eran Feit

Reviews

Start your review of Automatic Image Captioning with Vision Transformer and GPT-2

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.