Overview
Explore scaling techniques for XGBoost models with thousands of features in this 51-minute conference talk from Databricks. Dive into an online advertising use case that enables marketers to target users based on demographic information. Learn about the challenges faced, mistakes made, and valuable insights gained during the process of scaling XGBoost model training. Discover common pitfalls to avoid and notable differences between Python and Scala implementations of XGBoost in Spark. Gain practical knowledge from experts Phan Chuong and Eric Yatskowitz as they share their experiences in scaling machine learning models for production environments and supporting marketing decisions with data insights.
Syllabus
Intro
Welcome
Recording
Boulder Denver Group
Databricks Summit 2022
Fan and Eric
Introduction
Agenda
TMobile Marketing Solutions
Magenta Marketing Platform
Why dont we just use this data directly
How are demographic insights used
Pandas
UDF
Improving XGBoost
Data set
Why XGBoost
What we did
How did we achieve that
Parallelizations
Autoscaling
Normal transformation
Pivot vs Vector
RDD
Questions
Taught by
Databricks