Parallelizing Your ETL with Dask on Kubeflow
MLOps World: Machine Learning in Production via YouTube
Overview
Learn how to parallelize ETL processes using Dask on Kubeflow in this comprehensive conference talk. Explore the integration of Dask, a powerful Python library for parallel computing, with Kubeflow, a popular MLOps platform built on Kubernetes. Discover how to leverage Dask's advanced parallelism capabilities within Kubeflow's notebook service and pipeline workflows. Gain insights into the new Dask Operator for Kubernetes, which enables users to launch Dask clusters from Jupyter sessions and pipeline steps. Understand how to utilize Dask's distributed computing power to process larger-than-memory datasets and optimize performance in machine learning pipelines. Follow along as the speaker demonstrates installation procedures, provides practical examples, and showcases the benefits of combining Dask and Kubeflow for efficient data processing and ML workflows.
Syllabus
Parallelizing Your ETL with Dask on Kubeflow
Taught by
MLOps World: Machine Learning in Production