Showcase your skills in this Data Engineering project! In this course you will apply a variety of data engineering skills and techniques you have learned as part of the previous courses in the IBM Data Engineering Professional Certificate.
You will demonstrate your knowledge of Data Engineering by assuming the role of a Junior Data Engineer who has recently joined an organization and be presented with a real-world use case that requires architecting and implementing a data analytics platform.
In this Capstone project you will complete numerous hands-on labs. You will create and query data repositories using relational and NoSQL databases such as MySQL and MongoDB. You’ll also design and populate a data warehouse using PostgreSQL and IBM Db2 and write queries to perform Cube and Rollup operations.
You will generate reports from the data in the data warehouse and build a dashboard using Cognos Analytics. You will also show your proficiency in Extract, Transform, and Load (ETL) processes by creating data pipelines for moving data from different repositories. You will perform big data analytics using Apache Spark to make predictions with the help of a machine learning model.
This course is the final course in the IBM Data Engineering Professional Certificate. It is recommended that you complete all the previous courses in this Professional Certificate before starting this course.
Overview
Syllabus
- Data Platform Architecture and OLTP Database
- In this module, you will design a data platform that uses MySQL as an OLTP database. You will be using MySQL to store the OLTP data.
- Querying Data in NoSQL Databases
- In this module, you will design a data platform that uses MongoDB as a NoSQL database. You will use MongoDB to store the e-commerce catalog data.
- Build a Data Warehouse
- In this module you will design and implement a data warehouse and you will then generate reports from the data in the data warehouse.
- Data Analytics
- In this module, you will assume the role of a data engineer at an e-commerce company. Your company has finished setting up a data warehouse. Now you are assigned the responsibility to design a reporting dashboard that reflects the key metrics of the business.
- ETL & Data Pipelines
- In this module, you will use the given python script to perform various ETL operations that move data from RDBMS to NoSQL, NoSQL to RDBMS, and from RDBMS, NoSQL to the data warehouse. You will write a pipeline that analyzes the web server log file, extracts the required lines and fields, transforms and loads data.
- Big Data Analytics with Spark
- In this module, you will use the data from a webserver to analyse search terms. You will then load a pretrained sales forecasting model and predict the sales forecast for a future year.
- Final Submission and Peer Review
- In this final module you will complete your submission of screenshots from the hands-on labs for your peers to review. Once you have completed your submission you will then review the submission of one of your peers and grade their submission.
Taught by
Rav Ahuja and Ramesh Sannareddy