Python and Pandas for Data Engineering

Overview

In this first course of the Python, Bash and SQL Essentials for Data Engineering Specialization, you will learn how to set up a version-controlled Python working environment which can utilize third party libraries. You will learn to use Python and the powerful Pandas library for data analysis and manipulation. Additionally, you will also be introduced to Vim and Visual Studio Code, two popular tools for writing software. This course is valuable for beginning and intermediate students in order to begin transforming and manipulating data as a data engineer.

Syllabus

Getting Started with Python

In this module, you will learn how to set up an isolated Python environment with third party libraries and apply it by setting up a virtual environment including Pandas and Jupyter.

Essential Python

In this module, you will learn how to create and use Python Sequences, Dictionaries, Sets, List Comprehensions, and Generators. Additionally, you will learn how to apply these by manipulating client data in a Jupyter notebook.

Data in Python: Pandas and Alternatives

In this module, you will learn how to load data into a Pandas DataFrame and write statements to select columns and rows from a DataFrame. Additionally, you will apply comparison and boolean operators as a method of selecting data.

Python Development Environments

This week, you will learn the basics of some popular development environments and apply it by writing code in Vim and Visual Studio Code. Additionally, you will learn how to check your code into a Git repository.