A Data Engineer designs and builds systems that collect and transform the data used to inform business decisions. This learning path guides you through a curated collection of on-demand courses, labs, and skill badges that provide you with real-world, hands-on experience using Google Cloud technologies essential to the Data Engineer role.
Google Cloud Data Engineer Learning Path
Google Cloud via edX Professional Certificate
This course may be unavailable.
Overview
Syllabus
Course 1: Preparing for the Google Cloud Professional Data Engineer Exam
This course is intended for the following participants: 1.Cloud professionals interested in taking the Data Engineer certification exam. 2.Data engineering professionals interested in taking the Data Engineer certification exam.
Course 2: Google Cloud Big Data and Machine Learning Fundamentals
Data Analysts, Data Engineers, Data Scientists, and ML Engineers who are getting started with Google Cloud.
Course 3: Modernizing Data Lakes and Data Warehouses with Google Cloud
This course is intended for developers who are responsible for: Querying datasets, visualizing query results, and creating reports. Specific job roles include: Data Engineer, Data Analyst, Database Administrators, Big Data Architects.
Course 4: Building Batch Data Pipelines on Google Cloud
Developers responsible for designing pipelines and architectures for data processing.
Course 5: Building Resilient Streaming Analytics Systems on Google Cloud
This class is intended for data analysts, data scientists and programmers who want to build for out-of-the-ordinary scenarios such as high availability, resiliency, high-throughput, real-time streaming analytics on leveraging Google Cloud.
Course 6: Smart Analytics, Machine Learning, and AI on Google Cloud
This course covers several ways machine learning can be included in data pipelines on Google Cloud depending on the level of customization required.
Course 7: Serverless Data Processing with Dataflow: Foundations
This course is part 1 of a 3-course series on Serverless Data Processing with Dataflow.
Course 8: Serverless Data Processing with Dataflow: Develop Pipelines
In this second installment of the Dataflow course series, we are going to be diving deeper on developing pipelines using the Beam SDK.
Course 9: Serverless Data Processing with Dataflow: Operations
In the last installment of the Dataflow course series, we will introduce the components of the Dataflow operational model.
Courses
-
The purpose of this course is to help those who are qualified develop confidence to attempt the exam, and to help those not yet qualified to develop their own plan for preparation.
-
This course introduces the Google Cloud big data and machine learning products and services that support the data-to-AI lifecycle. It explores the processes, challenges, and benefits of building a big data pipeline and machine learning models with Vertex AI on Google Cloud.
-
The two key components of any data pipeline are data lakes and warehouses. This course highlights use-cases for each type of storage and dives into the available data lake and warehouse solutions on Google Cloud in technical detail. Also, this course describes the role of a data engineer, the benefits of a successful data pipeline to business operations, and examines why data engineering should be done in a cloud environment.
This is the first course of the Data Engineering on Google Cloud series. After completing this course, enroll in the Building Batch Data Pipelines on Google Cloud course.
-
Data pipelines typically fall under one of the Extra-Load, Extract-Load-Transform or Extract-Transform-Load paradigms. This course describes which paradigm should be used and when for batch data. Furthermore, this course covers several technologies on Google Cloud for data transformation including BigQuery, executing Spark on Dataproc, pipeline graphs in Cloud Data Fusion and serverless data processing with Dataflow. Learners will get hands-on experience building data pipeline components on Google Cloud using Qwiklabs.
-
Processing streaming data is becoming increasingly popular as streaming enables businesses to get real-time metrics on business operations. This course covers how to build streaming data pipelines on Google Cloud. Pub/Sub is described for handling incoming streaming data. The course also covers how to apply aggregations and transformations to streaming data using Dataflow, and how to store processed records to BigQuery or Cloud Bigtable for analysis. Learners will get hands-on experience building streaming data pipeline components on Google Cloud using QwikLabs.
-
Incorporating machine learning into data pipelines increases the ability of businesses to extract insights from their data. This course covers several ways machine learning can be included in data pipelines on Google Cloud depending on the level of customization required. For little to no customization, this course covers AutoML. For more tailored machine learning capabilities, this course introduces Notebooks and BigQuery machine learning (BigQuery ML). Also, this course covers how to productionalize machine learning solutions using Vertex AI. Learners will get hands-on experience building machine learning models on Google Cloud using QwikLabs.
-
This course is part 1 of a 3-course series on Serverless Data Processing with Dataflow. In this first course, we start with a refresher of what Apache Beam is and its relationship with Dataflow. Next, we talk about the Apache Beam vision and the benefits of the Beam Portability framework. The Beam Portability framework achieves the vision that a developer can use their favorite programming language with their preferred execution backend. We then show you how Dataflow allows you to separate compute and storage while saving money, and how identity, access, and management tools interact with your Dataflow pipelines. Lastly, we look at how to implement the right security model for your use case on Dataflow.
-
In this second installment of the Dataflow course series, we are going to be diving deeper on developing pipelines using the Beam SDK. We start with a review of Apache Beam concepts. Next, we discuss processing streaming data using windows, watermarks and triggers. We then cover options for sources and sinks in your pipelines, schemas to express your structured data, and how to do stateful transformations using State and Timer APIs. We move onto reviewing best practices that help maximize your pipeline performance. Towards the end of the course, we introduce SQL and Dataframes to represent your business logic in Beam and how to iteratively develop pipelines using Beam notebooks.
-
In the last installment of the Dataflow course series, we will introduce the components of the Dataflow operational model. We will examine tools and techniques for troubleshooting and optimizing pipeline performance. We will then review testing, deployment, and reliability best practices for Dataflow pipelines. We will conclude with a review of Templates, which makes it easy to scale Dataflow pipelines to organizations with hundreds of users. These lessons will help ensure that your data platform is stable and resilient to unanticipated circumstances.
Taught by
Google Cloud Training