Nimble: A New File Format for Large Datasets - Enhancing Data Warehouse Efficiency

Overview

Learn about Nimble, a groundbreaking file format developed and open-sourced by Meta, in this technical conference talk. Discover how Nimble addresses the limitations of traditional formats like Apache ORC and Parquet, particularly when handling wide tables common in machine learning training data preparation. Explore its enhanced efficiency through better parallel decoding capabilities using SIMD and GPUs, along with its flexible and extensible encoding support. Gain insights into Meta's training data preparation workflows, understand Presto Native's integration with Nimble, and learn about its current implementation status at Meta. Examine the ongoing development efforts and future roadmap, with a focus on fostering collaboration opportunities in analytics file formats.

Syllabus

Nimble, a new file format for large datasets

Taught by

Presto Foundation

Reviews

Start your review of Nimble: A New File Format for Large Datasets - Enhancing Data Warehouse Efficiency

Taught by

The Apache Spark File Format Ecosystem - Optimizing Storage for Performance

10 Best Machine Learning Courses for 2024: Scikit-learn, TensorFlow, and more

Never Stop Learning.