Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Realistic Synthetic Data Generation at Scale - Modeling Production Data Without Exposure

SNIAVideo via YouTube

Overview

Watch a 33-minute conference talk from SDC 2020 exploring how to generate realistic synthetic test data at scale that mirrors production data characteristics without exposing actual customer information. Learn about Druva's methodology for modeling and generating test datasets that maintain authentic patterns and relationships while being completely synthetic. Discover techniques for analyzing production data patterns, implementing models that capture key variables like file sizes and directory structures, and generating controlled random data that reflects real-world usage. Explore approaches for modeling directory trees, file distributions, naming conventions, and other critical variables needed for testing backup software, anti-virus tools, and legal discovery applications. Gain insights into creating versatile, repeatable synthetic datasets that enable thorough product testing while protecting sensitive production data. Principal Performance Engineer Mehul Sheth shares practical strategies for synthetic data generation that can be applied to various data types including mailboxes and transactional databases.

Syllabus

SDC 2020: Realistic Synthetic Data at scale: Influenced by, but not production data

Taught by

SNIAVideo

Reviews

Start your review of Realistic Synthetic Data Generation at Scale - Modeling Production Data Without Exposure

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.