Apache Spark for Machine Learning on Large Data Sets

Overview

Learn how to leverage Apache Spark's distributed computing framework for machine learning tasks in this conference talk from YOW! Night 2017. Discover how MLlib, Spark's machine learning library, simplifies the process of fitting models to massive datasets. Follow along as Data Science Tech Lead Juliet Hougland demonstrates practical techniques for both training models using MLlib and applying pre-trained scikit-learn models across large-scale data collections. Gain hands-on insights into distributed data processing and machine learning workflows that can effectively handle big data challenges.