What you'll learn:
- Learn and design the data lakehouse paradigm for a e-commerce company
- Hands-on lab environment is provided with this course
- Implement and deploy a medallion architecture using Prophecy running on Databricks
- Understand Apache Spark and its best practices with real-life use cases
- Share and extend Pipeline components with data practitioners and analysts
- Deploy Pipelines to production and CI/CD and best practices
- Utilize version control and change management in data engineering
- Deploy data quality checks and unit tests
This course is designed to help data engineers and analysts to build and deploy a cloud data lakehouse architectu using Prophecy's Data Transformation Copilot. It is created with the intention of helping you embark on your data engineering journey with Spark and Prophecy.
We will start by staging the ingested data from application platforms like Salesforce, operational databases with CDC transactional data, and machine generated data like logs and metrics. We’re going to clean and normalize the ingested tables to prepare a complete, clean, and efficient data model. From that data model, we’re going to build four projects creating consumption applications for different real-world use-cases. With each of the projects, you’re going to learn something new:
We will build a spreadsheet export for your finance department, where we will explore data modeling and transformation concepts. Since the finance department really cares about the quality of data, we’re going to also learn about how to setup unit and integration tests to maintain high quality.
We will create an alerting system for your operational support team to ensure customer success, where we’re going to learn about orchestration best practices.
Sales data upload that can be ingested back to Salesforce, where we will explore advanced extensibility concepts that will allows us to create and follow standardized practices.
A dashboard directly on Databricks for your product team to monitor live usage. Here we we learn the a lot about observability and data quality.
The best part? All of the code that will be building is completely open-source and accessible. You will be able to apply everything your learn here in your real projects.
Our entire team of best in-class data engineers and architects with tons of experience from companies like Salesforce, Databricks, and Instagram are going to walk you through, step by step, building out these use-cases.