In data engineering for data scientists, you will practice building ETL, NLP, and machine learning pipelines. This will prepare you for the project with our industry partner Figure 8.
Overview
Syllabus
- Introduction to Data Engineering
- You will get an introduction to the data engineering for data scientists course and project. The lessons include ETL pipelines, natural language pipelines, and machine learning pipelines.
- ETL Pipelines
- ETL stands for extract, transform, and load. This is the most common type of data pipeline, and you will practice each step in this lesson.
- NLP Pipelines
- In order to complete the project at the end of the course, you will need some natural language processing skills. Here you will practice engineering machine learning features from text data.
- Machine Learning Pipelines
- You'll use the Scikit-Learn package to code a machine learning pipeline. With these skills, you can ingest data, create features, and train a machine learning algorithm in just one step.
- Project: Disaster Response Pipeline
- You’ll build a machine learning pipeline to categorize emergency messages based on the needs communicated by the sender.
Taught by
Juno Lee (color), Andrew Paster and Arpan Chakraborty