How to Build a Data Pipeline Using Synthetic Data Generation and Testing with Python

Overview

Learn how to overcome data pipeline development challenges when real data is unavailable through this 31-minute conference talk from PyCon South Africa. Discover practical techniques for generating and utilizing synthetic data with Python, including statistical methods and packages like Faker and SDV to create realistic test data for customer profiles, transactions, and time series. Explore how to implement Flyway for loading synthetic data into Postgres databases and managing repeatable deployments. Gain valuable insights into best practices, benefits, and potential challenges of synthetic data testing through code examples and live demonstrations. Designed for intermediate Python developers, master the essential skills needed to build and validate robust data pipelines without requiring access to actual production data.

Syllabus

Time: Oct 05 Thu:
Duration:

Taught by

PyCon South Africa

Reviews

Start your review of How to Build a Data Pipeline Using Synthetic Data Generation and Testing with Python

Taught by

Database Schema Packaging and Migration in Python Distribution Packages

Python-Powered DOI Creation Automation for Research Workflows

Craft Complex Mock Data Using Graph-Based Configuration Files

10 Best Python Courses for 2024: Charming the Snake

Never Stop Learning.