Overview
Professionals with SQL, ETL, Enterprise Data Warehousing (EDW), Business Intelligence (BI) and Data Analysis skills are in great demand. This Specialization is designed to provide career relevant knowledge and skills for anyone wanting to pursue a job role in domains such as Data Engineering, Data Management, BI or Data Analytics. The program consists of four online courses. In the first course you learn the basics of SQL and how to query relational databases with this powerful language. Next you learn to use essential Linux commands and create basic shell scripts. You continue your journey by learning to build and automate ETL, ELT and data pipelines using BASH scripts, Apache Airflow and Apache Kafka. In the final course you learn about Data Lakes, Data Marts as well as work with Data Warehouses. You also create interactive reports and dashboards to derive insights from data in your warehouse. Note that this specialization has a significant emphasis on hands-on practice employing real tools used by data professionals. Every course has numerous hands-on labs as well as a course project. While you will benefit from some prior programming experience, it is not absolutely necessary for this course. The only pre-requisites for this specialization are basic computer and data literacy, and a passion to self-learn online.
Syllabus
Course 1: Hands-on Introduction to Linux Commands and Shell Scripting
- Offered by IBM. This course provides a practical understanding of common Linux / UNIX shell commands. In this beginner friendly course, you ... Enroll for free.
Course 2: Databases and SQL for Data Science with Python
- Offered by IBM. Working knowledge of SQL (or Structured Query Language) is a must for data professionals like Data Scientists, Data Analysts ... Enroll for free.
Course 3: ETL and Data Pipelines with Shell, Airflow and Kafka
- Offered by IBM. Delve into the two different approaches to converting raw data into analytics-ready data. One approach is the Extract, ... Enroll for free.
Course 4: Getting Started with Data Warehousing and BI Analytics
- Offered by IBM. Kickstart your Data Warehousing and Business Intelligence (BI) Analytics journey with this self-paced course. You will learn ... Enroll for free.
- Offered by IBM. This course provides a practical understanding of common Linux / UNIX shell commands. In this beginner friendly course, you ... Enroll for free.
Course 2: Databases and SQL for Data Science with Python
- Offered by IBM. Working knowledge of SQL (or Structured Query Language) is a must for data professionals like Data Scientists, Data Analysts ... Enroll for free.
Course 3: ETL and Data Pipelines with Shell, Airflow and Kafka
- Offered by IBM. Delve into the two different approaches to converting raw data into analytics-ready data. One approach is the Extract, ... Enroll for free.
Course 4: Getting Started with Data Warehousing and BI Analytics
- Offered by IBM. Kickstart your Data Warehousing and Business Intelligence (BI) Analytics journey with this self-paced course. You will learn ... Enroll for free.
Courses
-
Working knowledge of SQL (or Structured Query Language) is a must for data professionals like Data Scientists, Data Analysts and Data Engineers. Much of the world's data resides in databases. SQL is a powerful language used for communicating with and extracting data from databases. In this course you will learn SQL inside out- from the very basics of Select statements to advanced concepts like JOINs. You will: -write foundational SQL statements like: SELECT, INSERT, UPDATE, and DELETE -filter result sets, use WHERE, COUNT, DISTINCT, and LIMIT clauses -differentiate between DML & DDL -CREATE, ALTER, DROP and load tables -use string patterns and ranges; ORDER and GROUP result sets, and built-in database functions -build sub-queries and query data from multiple tables -access databases as a data scientist using Jupyter notebooks with SQL and Python -work with advanced concepts like Stored Procedures, Views, ACID Transactions, Inner & Outer JOINs through hands-on labs and projects You will practice building SQL queries, work with real databases on the Cloud, and use real data science tools. In the final project you’ll analyze multiple real-world datasets to demonstrate your skills.
-
Kickstart your Data Warehousing and Business Intelligence (BI) Analytics journey with this self-paced course. You will learn how to design, deploy, load, manage, and query data warehouses and data marts. You will also work with BI tools to analyze data in these repositories. You will begin this course by understanding different kinds of analytics repositories including data marts, data warehouses, data lakes, data lakehouses, and data reservoirs, and their functions and uses. They are designed to enable rapid business decision making through accurate and flexible reporting and data analysis. A data warehouse is one of the most fundamental business intelligence tools in use today, and one that successful Data Engineers must understand. In this course, you will learn to design, model and implement data warehouses and explore data-warehousing architectures such as Star and Snowflake schemas. You will also learn how to populate data warehouses using ETL and ELT processes, verify data, query data and how to use Cubes, Rollups, and materialized views/tables. You will become familiar with different BI tools used by experts in the industry such as IBM Cognos Analytics, Tableau, and Microsoft PowerBI. You will also use a BI tool to create data visualizations and build interactive dashboards to gain insights from data. The hands-on labs in this course will enable you to apply what you learn and gain a practical knowledge of Data Warehousing and BI Analytics. You will work with repositories like MySQL, PostgreSQL, and IBM Db2. You will also use BI tools like Cognos Analytics. At the end of this course, you will complete a project to demonstrate the skills you acquired in each module.
-
Delve into the two different approaches to converting raw data into analytics-ready data. One approach is the Extract, Transform, Load (ETL) process. The other contrasting approach is the Extract, Load, and Transform (ELT) process. ETL processes apply to data warehouses and data marts. ELT processes apply to data lakes, where the data is transformed on demand by the requesting/calling application. In this course, you will learn about the different tools and techniques that are used with ETL and Data pipelines. Both ETL and ELT extract data from source systems, move the data through the data pipeline, and store the data in destination systems. During this course, you will experience how ELT and ETL processing differ and identify use cases for both. You will identify methods and tools used for extracting the data, merging extracted data either logically or physically, and for loading data into data repositories. You will also define transformations to apply to source data to make the data credible, contextual, and accessible to data users. You will be able to outline some of the multiple methods for loading data into the destination system, verifying data quality, monitoring load failures, and the use of recovery mechanisms in case of failure. By the end of this course, you will also know how to use Apache Airflow to build data pipelines as well be knowledgeable about the advantages of using this approach. You will also learn how to use Apache Kafka to build streaming pipelines as well as the core components of Kafka which include: brokers, topics, partitions, replications, producers, and consumers. Finally, you will complete a shareable final project that enables you to demonstrate the skills you acquired in each module.
-
This course provides a practical understanding of common Linux / UNIX shell commands. In this beginner friendly course, you will learn about the Linux basics, Shell commands, and Bash shell scripting. You will begin this course with an introduction to Linux and explore the Linux architecture. You will interact with the Linux Terminal, execute commands, navigate directories, edit files, as well as install and update software. Next, you’ll become familiar with commonly used Linux commands. You will work with general purpose commands like id, date, uname, ps, top, echo, man; directory management commands such as pwd, cd, mkdir, rmdir, find, df; file management commands like cat, wget, more, head, tail, cp, mv, touch, tar, zip, unzip; access control command chmod; text processing commands - wc, grep, tr; as well as networking commands - hostname, ping, ifconfig and curl. You will then move on to learning the basics of shell scripting to automate a variety of tasks. You’ll create simple to more advanced shell scripts that involve Metacharacters, Quoting, Variables, Command substitution, I/O Redirection, Pipes & Filters, and Command line arguments. You will also schedule cron jobs using crontab. The course includes both video-based lectures as well as hands-on labs to practice and apply what you learn. You will have no-charge access to a virtual Linux server that you can access through your web browser, so you don't need to download and install anything to complete the labs. You’ll end this course with a final project as well as a final exam. In the final project you will demonstrate your knowledge of course concepts by performing your own Extract, Transform, and Load (ETL) process and create a scheduled backup script. This course is ideal for data engineers, data scientists, software developers, and cloud practitioners who want to get familiar with frequently used commands on Linux, MacOS and other Unix-like operating systems as well as get started with creating shell scripts.
Taught by
Hima Vasudevan, Jeff Grossman, Ramesh Sannareddy, Rav Ahuja, Sabrina Spillner, Sam Prokopchuk and Yan Luo