Data Privacy Techniques with Apache Spark - Defensive and Offensive Approaches
Databricks via YouTube
Overview
Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore data privacy techniques and protection of personally identifiable information in this 27-minute talk from Databricks. Compare offensive and defensive approaches, learning about k-anonymity, quasi-identifiers, and various methods like suppression, perturbation, obfuscation, encryption, tokenization, and watermarking. Discover elementary code examples for implementing these techniques when third-party products are unavailable. Examine approaches to minimize data exfiltration risks and understand how Databricks Delta can assist in making datasets privacy-ready. Gain insights into the long-term implications of different privacy methods and their effects on statistical usefulness, re-identification risks, data schema, format preservation, and read/write performance.
Syllabus
Intro
Data Privacy
Offensive techniques
Technique comparison dimensions
Pseudonymization
Hashing
Making hash cracking a bit more difficult
Credit card numbers
Token Vault with Databricks Delta
Synthetic data
Generalisation
Binning
Truncating: IP addresses
Rounding
Auditing
Remote desktop
Screenshot prevention
Feedback
Taught by
Databricks