Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Fixing Small Files Performance Issues in Apache Spark Using DataFlint

Big Data Demystified via YouTube

Overview

Learn how to optimize Apache Spark performance by addressing small files issues in data lakes during this 26-minute technical lecture. Explore the critical relationship between storage layer interactions and file management in big data environments, with a focus on best practices for file sizing. Discover how Apache Spark processes files at the task level, and master techniques for identifying and resolving small files problems using the open-source DataFlint library. Gain practical insights into handling small files challenges when working with modern storage formats like Delta Lake and Iceberg. Delivered by Meni Shmueli, founder of DataFlint and experienced big data specialist who has helped numerous companies enhance their data operations performance and development efficiency.

Syllabus

Fixing small files performance issues in Apache Spark, using DataFlint [English]

Taught by

Big Data Demystified

Reviews

Start your review of Fixing Small Files Performance Issues in Apache Spark Using DataFlint

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.