In this advanced course, you will gain practical expertise in scaling data engineering systems using cutting-edge tools and techniques. This course is designed for data scientists, data engineers, and anyone with a foundational understanding of data handling who desires to escalate their skills to handle larger, more complex datasets efficiently.
Throughout the course, you'll master the application of technologies such as Celery with RabbitMQ for scalable data consumption, Apache Airflow for optimized workflow management, and Vector and Graph databases for robust data management at scale.
The course will culminate with hands-on projects that offer real-world experience, where you'll put your acquired skills to test in solving data engineering challenges. You will not only learn to create scalable data systems but also to analyze their performance and make necessary adjustments for optimum results.
This invaluable experience in advanced data engineering techniques will prepare you for the demanding tasks of handling massive datasets, streamlining complex workflows, and optimizing data operations for businesses of any scale.
Overview
Syllabus
- Queues and Databases-RabbitMQ and MySQL
- In this module, you will learn about databases and queues. You will find out the purpose and components of RabbitMQ including its use of queues and integration with Celery. Through hands-on exercises, they will gain experience connecting Celery to RabbitMQ within a Flask application and implementing task patterns like fire and forget and result retrieval. The course also covers core MySQL skills like interacting via the command line interface, manipulating databases, and integrating with Python web apps. By the end, students will have a foundational understanding of RabbitMQ, Celery, and MySQL that allows them to start building modern, asynchronous applications backed by a database.
- Optimizing Workflow Management at Scale with Apache Airflow
- Achieving Scalability with Vector, Graph, and Key/Value Databases
- In this module, we explore vector and graph databases, powerful tools for managing and extracting insights from large, complex datasets. As data volumes continue to grow, scalability is crucial. We'll learn how vector and graph databases can efficiently store data while maintaining relationships, enabling more advanced analytics. Through real-world examples, you'll see how these databases unlock scalability for machine learning, fraud detection, social networks, and more.
- Real-world Advanced Data Engineering Projects
- In this final module, you will work on advanced real-world data engineering projects, applying everything you've learned. You'll encounter complex data challenges and devise solutions using the latest tools and techniques. This is an opportunity to bring together data engineering concepts covered throughout the course and implement them holistically to deliver impactful outcomes.
Taught by
Noah Gift and Alfredo Deza