Overview
Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore cloud native data pipelines in this 52-minute conference talk from GOTO Chicago 2017. Dive into the world of big data architecture as Sid Anand, Data Architect at Agari, shares insights on building resilient and cost-effective data pipelines. Learn about desirable qualities of data pipelines, message scoring use-cases, and architectural components. Discover strategies for tackling cost and operability challenges using tools like Apache Airflow. Gain knowledge on stream processing architecture and the benefits of Avro for data serialization. Understand how to implement queue-based auto-scaling groups (ASG) and leverage schema registries for efficient data management in cloud environments.
Syllabus
Introduction
About Me
Agari : What We Do
Cloud Native Data Pipelines
Desirable Qualities of a Resilient Data Pipeline
Use-Case : Message Scoring
Architectural Components
Tackling Cost
ASG : Queue-based
ASG - Build & Deploy
Tackling Operability : Requirements
Apache Airflow - Authoring DAGS
Apache Airflow - Perf. Insights
Apache Airflow - Alerting
Stream Processing Architecture
What is Avro?
Why is Avro Useful?
Avro Schema Example
Avro Schema Data File Example
Avro Schema Streaming Example
Avro Schema Registry
Taught by
GOTO Conferences