Leveraging Apache Spark and Delta Lake for Efficient Data Encryption at Scale
Databricks via YouTube
Overview
Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore an innovative approach to data privacy and security in this 25-minute conference talk from Databricks. Learn how Mars Petcare's data engineering team developed Gecko, an efficient CCPA compliance ecosystem designed for Apache Spark and Delta Lake. Discover how Gecko automates consumer deletion requests, enhances PII data security, maintains non-PII data integrity, and ensures accessibility of PII data when needed. Understand the implementation of row-level encryption for PII tables and the strategic storage of encryption keys. Gain insights into leveraging Spark and Delta Lake for large-scale data encryption, automated privacy rights requests, and enhanced platform security. Explore the potential for using the generated labeled dataset in developing machine learning models for automatic PII detection. Delve into the technical aspects, benefits, and future possibilities of this data privacy solution, tailored for organizations facing challenges in consumer data privacy compliance.
Syllabus
Intro
Agenda
Authors
The Petcare Data Platform
Our Mission
Gecko Ecosystem
Key Generation
Data Encryption
Optimizing Parquet Encryption
Master Table Generation
Gecko Delete
Benefits
Future Work
Taught by
Databricks