Reproducible and Deployable Data Science with Open-Source Python

Overview

Coursera Plus Flash Sale: All Certificates & Courses 40% Off. 72 Hours Only!

Grab it

Discover how to create scalable, reproducible, and deployable data science code in this EuroPython Conference talk. Learn to bridge the gap between rapid iteration and engineering reliability by leveraging three powerful open-source tools: Kedro, Apache Airflow, and Great Expectations. Explore techniques for developing modular and maintainable data science code with Kedro, orchestrating workflows using Apache Airflow with Astronomer, and ensuring data quality through Great Expectations. Gain insights into overcoming challenges associated with Jupyter notebooks, implementing declarative data management, and managing parameters and configurations. Delve into the AI lifecycle for IT production, examine different strategies for taking advantage of Kedro's extensibility, and understand the benefits of developing with Kedro while orchestrating with Airflow. By the end of this 30-minute talk, acquire the knowledge to create consensus among data scientists, data engineers, and machine learning engineers, enabling efficient collaboration in data science projects.

Syllabus

Intro
The Scenario
Different perspectives
The challenges of Jupyter notebook
Challenges with Data Management in Jupyter Notebook
Declarative Data Management in Kedro
Parameters & Configuration Management
Tradeoffs
Challenges with managing code in Jupyter Notebook
Development experience in Jupyter Notebook
Development experience in Kedro
MLOPS: THE AI LIFECYCLE FOR IT PRODUCTION
Taking advantage of Kedro's extensibility
Different strategies
Why developing with Kedro and orchestrating with Airflow?
Beyond a single project

Taught by

EuroPython Conference

Reviews

Start your review of Reproducible and Deployable Data Science with Open-Source Python

Taught by

Reproducible and Maintainable Data Science Code with Kedro

10 Best Data Science Courses

Never Stop Learning.