Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Best Practices for Building and Deploying Data Pipelines in Apache Spark

Databricks via YouTube

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore best practices for building and deploying data pipelines in Apache Spark in this 41-minute conference talk by Vicky Avison from Databricks. Learn about key considerations such as performance, idempotency, reproducibility, and tackling the small file problem when constructing data pipelines. Discover a common Data Engineering toolkit that separates production concerns from business logic, enabling non-Data-Engineers to define pipelines efficiently. Examine Waimak, an open-source library for Apache Spark, which streamlines the transition from prototype to production. Gain insights into new approaches and best practices for deploying data pipelines, an often overlooked aspect of Data Engineering. Understand the composition of data teams, challenges in pipeline development, and strategies for leveraging team skills effectively. Explore tools, frameworks, and design principles for creating a robust Data Engineering framework, along with simplified methods for data ingestion, business logic development, environment management, and deployments.

Syllabus

Intro
Cox Automotive
KPMG Lighthouse
What is this talk about?
What do we mean by 'Data Pipeline?
Who is in a data team?
What do we need to think about when building a pipeline?
What about the business logic?
What about deployments?
What are the main challenges?
How were we dealing with the main challenges?
Could we make better use of the skills in the team?
What tools and frameworks would we need to provide?
How would we design a Data Engineering framework?
How would we like to manage deployments?
Simpler data ingestion
Simpler business logic development
Simpler environment management
Simpler deployments

Taught by

Databricks

Reviews

Start your review of Best Practices for Building and Deploying Data Pipelines in Apache Spark

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.