Using Apache Spark and Differential Privacy for 2020 Census Data Protection

Using Apache Spark and Differential Privacy for 2020 Census Data Protection

Databricks via YouTube Direct link

The Census Bureau is using differential privacy for the 2020 Census.

8 of 22

8 of 22

The Census Bureau is using differential privacy for the 2020 Census.

Class Central Classrooms beta

YouTube playlists curated by Class Central.

Classroom Contents

Using Apache Spark and Differential Privacy for 2020 Census Data Protection

Automatically move to the next video in the Classroom when playback concludes

  1. 1 Intro
  2. 2 Abstract
  3. 3 Outline
  4. 4 Privacy and the Decennial Census
  5. 5 2010 Census: Summary of Publications (approximate counts)
  6. 6 We performed a database reconstruct and re-identification attack for all 308.745538 people in the 2010 Census
  7. 7 The basic idea of differential privacy: Uncertainty (noise) protects privacy
  8. 8 The Census Bureau is using differential privacy for the 2020 Census.
  9. 9 How much noise do we add? That's a policy decision.
  10. 10 We planned to create a Disclosure Avoidance System that dropped into the Census production system.
  11. 11 The Disclosure Avoidance System allows the Census Bureau to enforce global confidentiality protections
  12. 12 Our DP mechanism protects histograms of person types. Census "block"
  13. 13 Running the block-by-block algorithm with spark
  14. 14 In 2018 we invented the TopDown Algorithm (TDA)
  15. 15 Key challenges in monitoring spark
  16. 16 We created our own monitoring framework
  17. 17 Cluster List
  18. 18 Each DAS run is a "mission"
  19. 19 Mission Report
  20. 20 System Load
  21. 21 Free Memory
  22. 22 In Summary

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.