Learn how to make Apache Spark work with other Big Data technologies and put together an end-to-end project that can solve a real-world business problem.
Overview
Syllabus
Introduction
- Driving big data engineering with Apache Spark
- Course prerequisites
- Setting up the exercise files
- What is data engineering?
- Data engineering vs. data analytics vs. data science
- Data engineering functions
- Batch vs. real-time processing
- Data engineering with Spark
- Spark architecture review
- Parallel processing with Spark
- Spark execution plan
- Stateful stream processing
- Spark analytics and ML
- Batch processing use case: Problem statement
- Batch processing use case: Design
- Setting up the local DB
- Uploading stock to a central store
- Aggregating stock across warehouses
- Real-time use case: Problem
- Real-time use case: Design
- Generating a visits data stream
- Building a website analytics job
- Executing the real-time pipeline
- Batch vs. real-time options
- Scaling extraction and loading operations
- Scaling processing operations
- Building resiliency
- Project exercise requirements
- Solution design
- Extracting long last actions
- Building a scorecard
- More about Apache Spark
Taught by
Ben Sullins