Overview
Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore Apache Spark's machine learning capabilities beyond linear regression in this 32-minute conference talk from Scala Days Chicago 2017. Dive deep into Spark's scikit-learn inspired ML pipelines, learning how to integrate custom data preparation and machine learning tools. Discover how to leverage meta-algorithms like parameter searching, customize Spark models, and build more powerful ML pipelines. Gain insights into the ML API structure and its functionality, even if you're new to Spark. Cover topics including Spark Datasets, building transformers, configurable parameters, cross-validation, and publishing ML models. Suitable for those with basic Spark knowledge, but also accessible to beginners seeking a broad understanding of Spark ML functions.
Syllabus
Intro
Spark MLA
Introduction
The Good Stuff
Spark Datasets
Spark ML Adventure
Building a Transformer
Transform Schema
Word Count
Verify Input Schema
Verify Transform Schema
Pipeline Stage
Configurable Parameters
Crossvalidation
Example
Input columns
Classifier
Naive classifier
Vector assembler
Dataframe transformations
More code
Basetrade cadets
Lunch
Putting it in an ML pipeline
Publishing your ML model
Spark example repo
Spark books
Buy this book
Data and Comps
Questions
Taught by
Scala Days Conferences