Building Local LLMs for OCR, Object Detection and Image Parsing Using Mono-InternVL
Machine Learning With Hamza via YouTube
Overview
Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Learn to implement and run the Mono-InternVL model locally for performing OCR, object detection, code generation, and document parsing tasks in this 16-minute tutorial video. Discover how this newly introduced small Vision Language Model (VLM) achieves top precision while maintaining efficient performance. Follow along with a detailed walkthrough covering model architecture, key features, and step-by-step implementation instructions for local deployment. Gain hands-on experience working with the model through practical demonstrations and code examples, with references to the official repository, research paper, and Hugging Face model implementation.
Syllabus
Intro
Model presentation
Run the model locally
Taught by
Machine Learning With Hamza