Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Continuous Data Pipeline for Real-Time Benchmarking and Data Set Augmentation

Data Council via YouTube

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore a 15-minute conference talk from Data Council on building continuous data pipelines for real-time benchmarking and dataset augmentation. Learn how to generate datasets and implement real-time precision/recall splits to detect data shifts, prioritize data collection, and retrain models. Discover the importance of curating representative datasets for accurate ML systems and monitoring post-deployment metrics. Gain insights into addressing data shifts in unstructured language models and leveraging open-source APIs and annotation tools to streamline processes. Presented by Ivan Aguilar, a data scientist at Teleskope, this talk covers topics such as the problem statement, usual approaches, open-source data APIs, task overview, annotations overview, and final thoughts on improving ML model performance through effective data management strategies.

Syllabus

Intro
Why is this a problem?
Usual Approaches
Open Source Data API's
Task Overview
Annotations Overview
Final Thoughts

Taught by

Data Council

Reviews

Start your review of Continuous Data Pipeline for Real-Time Benchmarking and Data Set Augmentation

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.