Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

LinkedIn Learning

Data Science on Google Cloud Platform: Building Data Pipelines

via LinkedIn Learning

Overview

Learn how to design and build big data pipelines on Google Cloud Platform.

Syllabus

Introduction
  • What goes into a data pipeline?
  • Data science modules covered
1. GCP Data Pipeline Products
  • GCP data pipeline options
  • Cloud Dataproc
  • Cloud Dataflow
  • Cloud Pub/Sub
2. Apache Beam
  • What is Apache Beam?
  • Beam pipelines
  • PCollections
  • Transforms
  • Pipeline I/O
  • Runners
3. Setting Up Dataflow
  • Setting up GCP for Dataflow
  • Setting up Python
  • Creating a simple pipeline
  • Executing in Dataflow
4. Data Processing with Beam and Dataflow
  • Reading text files
  • ParDo
  • GroupBy
  • Map
  • Combine
  • Writing data to text files
  • Other capabilities
5. Cloud Pub/Sub
  • What is Pub/Sub?
  • Topics and messages
  • Publishers
  • Subscribers
  • Create a topic
  • Create a subscription
  • Publish and receive
  • Python SDK
6. Streaming with Dataflow
  • Streaming with Dataflow
  • Windowing with Dataflow
  • Streaming and windowing example
Conclusion
  • Next steps

Taught by

Kumaran Ponnambalam

Reviews

4.5 rating at LinkedIn Learning based on 126 ratings

Start your review of Data Science on Google Cloud Platform: Building Data Pipelines

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.