Overview
Explore a comprehensive conference talk on scaling machine learning solutions with Apache Spark. Dive into the practical insights shared by Adi Polak and Holden Karau as they discuss their book "Scaling Machine Learning with Spark." Learn about the Apache Spark ecosystem, MLlib, MLflow, TensorFlow, and PyTorch for building end-to-end distributed ML workflows. Discover how to manage the ML lifecycle, perform data preprocessing, explore feature engineering, and train models using MLlib. Gain valuable knowledge on combining Spark with deep learning, working with distributed TensorFlow, and scaling machine learning with PyTorch. Understand the challenges and trade-offs in distributed ML, and explore the intersection of ML and data engineering.
Syllabus
Intro
Lead with the tools & resources you have
The Apache Spark ecosystem
Book chapter overview
Exploring the glue spaces in ML & data engineering
Navigating the trade-offs of distributed ML
Challenges of keeping up with Open Source software
Can 2e expect another book?
Outro
Taught by
GOTO Conferences