Overview
Explore distributed multi-GPU computing using Dask, CuPy, and RAPIDS in this EuroPython 2019 conference talk. Discover how recent developments in NumPy community standards and protocols have simplified the integration of distributed and GPU computing libraries. Learn about GPU-accelerated clustering, the RAPIDS ecosystem for end-to-end GPU-accelerated data science, and the benefits of the Apache Arrow format. Dive into practical examples of distributed computing with Dask, including SVD benchmarks and scaling up with RAPIDS. Gain insights into the challenges of communication in distributed systems and explore the roadmap towards version 1.0 of these technologies. Enhance your understanding of high-performance computing techniques for data science applications.
Syllabus
Intro
GPU-Accelerated Clustering Code Example
What is RAPIDS? New GPU Accelerated Data Science Pipeline
RAPIDS End-to-End GPU-Accelerated Data Science
Learning from Apache Arrow
Data Science Workflow with RAPIDS
Ecosystem Partners
ML Technology Stack
Distributing Dask
Dask SVD Example
Numpy Array Function (NEP-18)
Python CUDA Array Interface
Interoperability for the Win
Challenges: Communication
SVD Benchmark
Scale up with RAPIDS
Road to 1.0
Additional Reading Material
Taught by
EuroPython Conference