This comprehensive course is a hands-on guide to developing and maintaining high-quality datasets for visual AI applications. Learners will gain in-depth knowledge and practical skills in: discovering and implementing various labeling approaches, from manual to fully automated methods; assessing and improving annotation quality for object detection tasks, including identifying and correcting common labeling issues; analyzing the impact of bounding box quality on model performance and developing strategies to enhance label consistency; use advanced tools like FiftyOne and CVAT for dataset exploration, error correction, and annotation refinement; addressing complex challenges in computer vision, such as overlapping detections, occlusions, and small object detection; implementing data augmentation techniques to improve model robustness and generalization; and applying concepts like sample hardness and entropy in the context of model training and dataset curation. Through a combination of theoretical knowledge and hands-on exercises, students will learn to create, maintain, and optimize datasets that lead to more accurate and reliable visual AI models.
Overview
Syllabus
- Getting Started and the Data-Centric AI Paradigm
- At the end of this module, you will be able to describe the data-centric AI paradigm and its importance in modern deep learning workflows. You will be able to explain the data and model feedback loop in the context of object detection and instance segmentation tasks. You'll be able to apply FiftyOne to evaluate initial model performance for object detection and instance segmentation tasks. You'll be able to interpret common evaluation metrics for object detection and instance segmentation models.
- Image Quality and Its Impact on Model Performance
- After this module, you will be able to analyze dataset statistic to gain a holistic understanding of the data. You will be able to identify and assess various image quality issues that can impact model performance. You will be able to use FiftyOne to detect and visualize image quality problems, outliers, and diversity issues. And finally, you'll be able to develop strategies to address identified image quality and diversity issues.
- Label Quality and Its Impact on Model Performance
- After this module, you will be able to assess the quality of annotations for object detection tasks. You'll be able to identify common labeling issues such as mislabeled data, hard samples, and occlusions. You will be able to analyze the impact of bounding box on model performance and develop strategies to improve label quality and consistency.
- Putting It All Together
- After this module, you will be able to apply advanced data-centric AI techniques such as data augmentation and active learning. You will be able to implement an end-to-end workflow for iterative model improvement using FiftyOne. You will be able to develop a strategy for maintaining dataset quality over time and finally be able to synthesize and apply techniques to improve model performance on a given dataset.
Taught by
Harpreet Sahota