Practice fundamental skills using Python for data engineering in this hands-on, interactive course with coding challenges in CoderPad.
Overview
Syllabus
Introduction
- Welcome to the course
- What you should know
- CoderPad tour
- Introduction to Python and data engineering
- Setting up your Python environment
- Explore a Google Colab worksheet
- Variables and data types
- Operators and expressions
- Control structures
- Functions
- Modules and packages
- String manipulation
- Error handling
- Solution: String Manipulation
- Collection overview
- Python collections: Tuples
- Python collections: Lists
- Python collections: Sets
- Python collections: Dictionaries
- Solution: Analyze list
- File I/O overview
- Working with CSV files
- Working with JSON files
- Solution: Read/Write text to file
- Introduction to pandas
- Read files as DataFrames
- Data cleaning and preprocessing
- Data manipulation and aggregation
- Data visualization
- Write DataFrames as files
- Solution: Play with pandas
- Introduction to NumPy
- Array creation and attributes
- Array operations
- Indexing and slicing
- Linear algebra and statistics
- Write DataFrames as files
- Solution: NumPy Array Operation
- Understanding classes and objects
- Implementation: Classes and objects in Python
- Understand OOP features: Abstraction, inheritance, and more
- Solution: Accessing Object attributes
- Tips to write efficient Python code
- What is ETL in the data engineering world?
- What is Hadoop?
- Understand PySpark for data engineering
- Importance of visualization tools in DE
- On-prem vs. cloud data engineering
- Capstone project: Retail sales analysis
- Solution: Capstone project
- Next steps
Taught by
Deepak Goyal