Overview
Explore the critical intersection of security and data science in this 52-minute Black Hat conference talk by Joshua Saxe. Delve into the challenges and opportunities of applying data science to cybersecurity, including machine learning, data visualization, and scalable storage technologies. Learn about state-of-the-art data visualization techniques and the three main machine learning tasks: classification, clustering, and regression. Discover how these methods can be applied to attack detection, threat intelligence, malware analysis, and scalable malware analytics. Address security-specific data science challenges, such as detecting malicious activity in vast amounts of benign data and training machine learning models without access to zero-day attack data. Examine statistical methods designed to generalize to new attacks and minimize false positives. Investigate security data visualization techniques, including log visualization, malware analysis visualization, and threat intelligence visualization. Understand how machine learning approaches can bridge the semantic gap between low-level security data and high-level activity of interest. Gain insights into the emerging field of security data science, its potential applications, and effective approaches to overcome its unique challenges.
Syllabus
Intro
Presentation Overview
Some definitions: training data, test data, prediction, generalization
Why clustering matters
Classification
Why regression matters
Visualization design principles
The presence of an adversary
The false positive problem
A machine learning based detection fail due to false positives
The need for interpretability
Another potential mitigation
Taught by
Black Hat