Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Let's Talk About Raw Documents - Extracting Structured Data for ML Pipelines

MLOps.community via YouTube

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Dive into a comprehensive 50-minute MLOps Community Meetup talk featuring Crag Wolfe, Infrastructure Team Lead at Unstructured.io. Explore the world of raw document processing in modern ML pipelines, focusing on extracting structured data from various file formats. Learn about Unstructured.io's open-source libraries and their NLP-focused approach. Discover how to rapidly build custom preprocessing APIs, understand the SEC Filing Section Pipeline, and gain insights into sentiment analysis models. Follow along with a detailed demo, developer quick start guide, and discussions on scaling issues, document editing, and future directions beyond NLP. Connect with the MLOps community through provided links and engage with Crag Wolfe's expertise in back-end engineering and NLP startups.

Syllabus

[] Introduction to Crag Wolfe
[] Agenda
[] Unstructured.io introduction
[] Then open-source community
[] The goal
[] Rapidly build custom preprocessing API
[] Staging
[] Demo
[] Developer quick start
[] SEC Filing Section Pipeline
[] Section 1: Pulling in Raw Documents
[] Section 2: Reading the Document
[] Section 3: Custom Partitioning Bricks
[] Section 4: Cleaning Bricks
[] Section 5: Staging Bricks
[] Section 6: Define the Pipeline API
[] SEC Sentiment Analysis Model notebook
[] Stage for transformers
[] Training a summarization model with Unstructured + Argilla + Huggingface
[] Crag's previous engineering experience
[] Deciding what to tackle next
[] Editing documents
[] Scaling issues
[] Moving out of NLP
[] Wrap up

Taught by

MLOps.community

Reviews

Start your review of Let's Talk About Raw Documents - Extracting Structured Data for ML Pipelines

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.