Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Building Local LLMs for OCR, Object Detection and Image Parsing Using Mono-InternVL

Machine Learning With Hamza via YouTube

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Learn to implement and run the Mono-InternVL model locally for performing OCR, object detection, code generation, and document parsing tasks in this 16-minute tutorial video. Discover how this newly introduced small Vision Language Model (VLM) achieves top precision while maintaining efficient performance. Follow along with a detailed walkthrough covering model architecture, key features, and step-by-step implementation instructions for local deployment. Gain hands-on experience working with the model through practical demonstrations and code examples, with references to the official repository, research paper, and Hugging Face model implementation.

Syllabus

Intro
Model presentation
Run the model locally

Taught by

Machine Learning With Hamza

Reviews

Start your review of Building Local LLMs for OCR, Object Detection and Image Parsing Using Mono-InternVL

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.