Overview
Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
The DeepLearning.AI Data Engineering Professional Certificate is a comprehensive online program for data engineers and practitioners looking to start or grow their careers.
Organizations of all sizes and across all industries are capturing and generating data at an ever-increasing pace. Within these organizations, every team, from executives, sales and marketing, finance and operations, product and engineering, to customer service, can derive insights and value from organizational data. Whether the end use case is data science, machine learning, or analytics, data engineering is what allows raw data to be converted to value for the business. This is why the role of data engineer is one of the highest-demand jobs in tech today.
Throughout this program, you'll learn the foundations of data engineering while gaining hands-on experience designing and implementing data architectures using AWS and open-source tools.
Taught by industry expert Joe Reis, co-author of Fundamentals of Data Engineering, this certificate equips you with the skills and knowledge to excel in a high-demand field, focusing on ingesting, processing, transforming, storing, and serving data to data stakeholders to drive organizational and business objectives. The practical labs were developed in partnership with AWS and Factored.AI to provide you with an authentic experience building data systems on the cloud.
With this certificate, you will have the tools to further your data engineering career.
Syllabus
Course 1: Introduction to Data Engineering
- Offered by DeepLearning.AI and Amazon Web Services. In this course, you will be introduced to the data engineering lifecycle, from data ... Enroll for free.
Course 2: Source Systems, Data Ingestion, and Pipelines
- Offered by DeepLearning.AI and Amazon Web Services. In this course, you will explore various types of source systems, learn how they ... Enroll for free.
Course 3: Data Storage and Queries
- Offered by DeepLearning.AI and Amazon Web Services. In this course, you will learn about the raw ingredients and processes that are used to ... Enroll for free.
Course 4: Data Modeling, Transformation, and Serving
- Offered by DeepLearning.AI and Amazon Web Services. In this course, you’ll model, transform, and serve data for both analytics and machine ... Enroll for free.
- Offered by DeepLearning.AI and Amazon Web Services. In this course, you will be introduced to the data engineering lifecycle, from data ... Enroll for free.
Course 2: Source Systems, Data Ingestion, and Pipelines
- Offered by DeepLearning.AI and Amazon Web Services. In this course, you will explore various types of source systems, learn how they ... Enroll for free.
Course 3: Data Storage and Queries
- Offered by DeepLearning.AI and Amazon Web Services. In this course, you will learn about the raw ingredients and processes that are used to ... Enroll for free.
Course 4: Data Modeling, Transformation, and Serving
- Offered by DeepLearning.AI and Amazon Web Services. In this course, you’ll model, transform, and serve data for both analytics and machine ... Enroll for free.
Courses
-
In this course, you’ll model, transform, and serve data for both analytics and machine learning use cases. You’ll explore various data modeling techniques for batch analytics, including normalization, star schema, data vault, and one big table, and you’ll use dbt to transform a dataset based on a star schema and one big table. You’ll also compare the Inmon vs Kimball data modeling approaches for data warehouses. You’ll model and transform a tabular dataset for machine learning purposes. You’ll also model and transform unstructured image and textual data. You’ll explore distributed processing frameworks such as Hadoop MapReduce and Spark, and perform stream processing. You’ll identify different ways of serving data for analytics and machine learning, including using views and materialized views, and you’ll describe how a semantic layer built on top of your data model can support the business. In the last week of this course, you’ll complete a capstone project where you’ll build an end-to-end data pipeline that encompasses all of the stages of the data engineering lifecycle to serve data that provides business value.
-
In this course, you will learn about the raw ingredients and processes that are used to physically store data on disk and in memory. You’ll explore different storage systems, including object, block, and file storage, as well as databases, that are built on top of these raw ingredients. You’ll also get a chance to use the Cypher language to query a Neo4j graph database, and perform vector similarity search, a key feature behind generative AI and large language models. You will explore the evolution of data storage abstractions, from data warehouses, to data lakes, and data lakehouses, while comparing the advantages and drawbacks of each architectural paradigm. With hands-on practice, you will design a simple data lake using Amazon Glue, and build a data lakehouse using AWS LakeFormation and Apache Iceberg. In the last week of this course, you’ll see how queries work behind the scenes, practice writing more advanced SQL queries, compare the query performance in row vs column-oriented storage, and perform streaming queries using Apache Flink.
-
In this course, you will be introduced to the data engineering lifecycle, from data generation in source systems, to ingestion, transformation, storage, and serving data to downstream stakeholders. You’ll study the key undercurrents that affect all stages of the lifecycle, and start developing a framework for how to think like a data engineer. To gain hands-on practice, you’ll gather stakeholder needs, translate those needs into system requirements, and choose tools and technologies to build systems that provide business value. By the end of this course you’ll be spinning up batch and streaming data pipelines to serve product recommendations on the AWS cloud!
-
In this course, you will explore various types of source systems, learn how they generate and update data, and troubleshoot common issues you might encounter when trying to connect to these systems in the real world. You’ll dive into the details of common ingestion patterns and implement batch and streaming pipelines. You’ll automate and orchestrate your data pipelines using infrastructure as code and pipelines as code tools. You’ll also explore AWS and open source tools for monitoring your data systems and data quality.
Taught by
Joe Reis