Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Modern Data Science with Vaex - A New Approach to DataFrames and Pipelines

EuroPython Conference via YouTube

Overview

Explore modern data science techniques using Vaex, a powerful DataFrame library, in this 51-minute EuroPython Conference talk. Learn how to efficiently process large datasets on personal computers by leveraging computational graphs, lazy evaluations, memory-mapped storage, and out-of-core algorithms. Discover methods for cleaning, filtering, grouping, and transforming data while visualizing and analyzing correlations. Gain insights into handling datasets with millions or billions of samples without relying on distributed computing. Follow along as the speaker demonstrates practical examples using New York City taxi data, covering topics such as expressions, memory mapping, missing values, filtering, categorizing, group operations, density maps, machine learning, and virtual columns. Understand how Vaex optimizes memory and CPU usage, enabling data scientists to work effectively on laptops or workstations with limited RAM but fast SSD storage.

Syllabus

Introduction
Dataset options
Who is Jovan
Demo
Expressions
Data Science Example
Memory Map
Missing Values
Number of Passengers
Trip Distances
New York
New York City
Filter
Trip duration
Categorizing
Group by Standard
Density Maps
Machine Learning
Memory
PCA
PCA on a subsample
Payment type
String operations
Memory usage
Light GBM
Predict method
Wrappers
Virtual columns
Testing the notebook
Conclusion
Questions

Taught by

EuroPython Conference

Reviews

Start your review of Modern Data Science with Vaex - A New Approach to DataFrames and Pipelines

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.