Overview
Syllabus
Intro
Abstract
Outline
Privacy and the Decennial Census
2010 Census: Summary of Publications (approximate counts)
We performed a database reconstruct and re-identification attack for all 308.745538 people in the 2010 Census
The basic idea of differential privacy: Uncertainty (noise) protects privacy
The Census Bureau is using differential privacy for the 2020 Census.
How much noise do we add? That's a policy decision.
We planned to create a Disclosure Avoidance System that dropped into the Census production system.
The Disclosure Avoidance System allows the Census Bureau to enforce global confidentiality protections
Our DP mechanism protects histograms of person types. Census "block"
Running the block-by-block algorithm with spark
In 2018 we invented the TopDown Algorithm (TDA)
Key challenges in monitoring spark
We created our own monitoring framework
Cluster List
Each DAS run is a "mission"
Mission Report
System Load
Free Memory
In Summary
Taught by
Databricks