Data Contracts for Long-Term Schema Management in Apache Kafka

Overview

Explore effective strategies for long-term schema management using data contracts in this 57-minute podcast episode featuring Abraham Leal, Customer Success Technical Architect at Confluent. Learn about the importance of well-defined guidelines and standards for data quality, enforcement, and transfer across organizations. Discover the benefits of associating Apache Kafka® data with data contracts (schemas) and how to leverage schema registry for easy evolution. Gain insights into using GitOps automation features for managing data pipelines, topic versioning, and data quality assurance. Understand why Protobuf and Avro formats are preferred over XML or JSON for schema evolution and cost savings. Delve into concepts such as schema references, compatibility types, topic versioning tradeoffs, and upcasters/downcasters. Get recommendations on tools for improving data discoverability and learn how to implement schema registry effectively from the start.

Syllabus

- Intro
- What is a data contract?
- What are the problems with using JSON Blobs?
- What are the advantages of using Avro and Protobuf formats?
- What are schema references?
- What support is available for changing the data format?
- What are forwards, backwards, and full compatibility?
- What should you do if you have two different formats?
- What are the tradeoffs of doing topic versioning?
- What are upcasters and downcasters?
- Are there any recommended tools for making data discoverability easier?
- It's a wrap!