Overview
Explore resilient predictive data pipelines in this 45-minute conference talk from GOTO Chicago 2016. Dive into the world of big data architecture with Siddharth "Sid" Anand, Data Architect at Agari Inc. Learn how major tech companies build custom data pipelines to meet strict requirements on security, fault-tolerance, cost control, scalability, and uptime. Discover key concepts such as data products, serving pipelines, and the blast radius problem. Gain insights into timeliness, dead letter queues, and the SNS + SOS design pattern. Explore Apache Airflow for authoring DAGs, performance insights, and alerting. Examine near-real-time architecture, schema registry, AWS Lambda, and elastic stream processing. Perfect for data engineers and architects looking to enhance their knowledge of resilient and scalable data pipeline design.
Syllabus
Introduction
About Me
Motivation
Data Products
Serving + Data Pipelines
The Blast Radius Problem
Timeliness
SOS - Dead Letter Queue
SNS + SOS Design Pattern
What Does Agari Do?
Apache Airflow - Authoring DAGS
Apache Airflow - Perf. Insights
Apache Airflow - Alerting
NRT Architecture
Schema Registry
What is AWS Lambda?
Elastic Stream Processing
Open Source Plans
Questions? @r39132
SOFTWARE DEVELOPMENT CONFERENCE
Taught by
GOTO Conferences