Overview
Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore Apache Spark's capabilities for analyzing large-scale distributed data in this GOTO Chicago 2018 conference talk. Dive into the world of password security as Kelley Robinson, Developer Evangelist at Twilio, demonstrates how to process and analyze over 500 million leaked passwords using Spark. Learn about Spark's API advancements for Scala, Python, and SQL, and discover techniques for efficient data processing. Gain insights into password trends, popular choices, and security implications. Understand the challenges and benefits of working with Spark, including nested error messages and documentation. Discuss data privacy concerns and practical steps for improving password security. Conclude with audience questions and valuable takeaways for implementing Spark in your own projects.
Syllabus
Introduction
What is Twilio
Agenda
What is Spark
We dont need Spark
Data Science Data Engineering
RDD
GroupByKey
DataSets
State of Password
Have I Been Owned
The Data
Schema Check
Most Popular Passwords
Length Column
Run Raw Sequel
Filtering Passwords
Password Data
Schema Inference
UserDefined Functions
Results
Dog Rights
Benefits of Spark
Challenges
Nested Error Messages
Apache Spark Documentation
Security Implications
Data Privacy
Security
What can you do
Thank you
Conclusion
Closing
Audience Questions
Taught by
GOTO Conferences