AI Skills for Engineers: Data Engineering and Data Pipelines

Overview

Artificial Intelligence and Machine Learning have become central techniques for most services and products, ranging from web-based systems to medical procedures, self-driving cars – even intelligent coffee makers.

Alongside algorithms, data is central to AI applications. Without solid data management, AI projects typically underperform or even fail. Unfortunately, the relevance and complexity of handling data is frequently underestimated.

That’s why we developed this course which covers foundational questions like “Why is data important to AI?” and “What data does AI need?” and covers more application-oriented topics and skills like how to extract, load and query data using an SQL pipeline.

In the second part of the course, you will learn basic data engineering skills, including how to setup your Python notebook environment, explore data with advanced pandas functions, and create simple and clear data visualizations.

This introductory course is targeted at learners with little experience in data management or Python-based data management who want to develop Python-based AI applications in the future. The course covers a brief introduction into data management for AI, relational data management (e.g., SQL), and practical data handling skills in Python, pandas, and Jupyter.

This allows you to build a foundation to prepare for future AI and Machine Learning development with Python.

Syllabus

Week 1:

We ask why we should care about data management for Artificial Intelligence and Machine Learning (ML) systems.
We examine which data are needed in the ML lifecycle and what properties that data should have.
We discuss the effort and time needed for data management activities, and look at possible data sources.

Week 2:

The basic key concepts of data management, such as databases, data models and data schemas are all introduced.
The Relational Data Model is explained and contrasted with the Single-Table Model (like CSV and Excel) and Document Models.

Week 3:

We show how to extract data from existing relational databases using SQL queries and converting the query results into CSV files for further processing using pandas in Python notebooks.

Week 4:

The different ways setoff setting up and running Python notebooks are covered, including cloud-based notebooks and local notebooks.
We will take you step by step through the process of setting up your conda environment and installing Jupyter and pandas libraries.
You will learn how to run notebooks in VS code.

Week 5:

Become a pandas expert.
Explore the essential functionalities of pandas and, most importantly, write elegant and efficient Python pandas code to process and engineer tabular data.

Week 6:

You will learn how to make simple and clear scientific figures in Python using the Seaborn library.
Use the core functions provided by Seaborn to make beautiful statistical plots.

Taught by

Christoph Lofi and Junzi Sun

Reviews

4.0 rating, based on 1 Class Central review

Start your review of AI Skills for Engineers: Data Engineering and Data Pipelines

Azuremis @azuremis

Superb course covering the basics of data engineering. The instructor is informative and engaging, good content. I audited this course and found the last assessments most useful.

Taught by

Tags

Data Analysis with Python

Introduction to Data Science in Python

Data Visualization with Python

Python Project for Data Science

Data Engineering Fundamentals

Data Analysis Using Python

100+ Free Online Courses and Webinars on Artificial Intelligence in Healthcare

AI for Everyone: 10 Best Free Artificial Intelligence Courses for 2024

10 Best Python Courses for 2024: Charming the Snake

10 Best Free SQL Courses for 2024

10 Best Pandas Courses for 2024

100 Top FREE edX Courses of All Time

Massive List of MOOC-based Microcredentials

Never Stop Learning.