Overview
Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Learn how to use data engineering to leverage big data for business strategy, data analysis, or machine learning and AI. By completing this course series, you'll empower yourself with the knowledge and proficiency required to build efficient data pipelines, manage cutting-edge platforms like Hadoop, Spark, Snowflake, Databricks, and Kubernetes, and tell stories with data through visualization. You will delve into foundational big data concepts, distributed computing with Spark, Snowflake’s architecture, Databricks’ machine learning capabilities, Python techniques for data visualization, and critical methodologies like DataOps.
This course series is designed for software engineers, developers, researchers, and data scientists who want to strengthen their specialization in data science or machine learning, as well as for professionals who are interested in pursuing a career as a data-focused software engineer, data scientist, or a data engineer working in cloud, machine learning, business intelligence, or other field.
Syllabus
Course 1: Spark, Hadoop, and Snowflake for Data Engineering
- Offered by Duke University. e.g. This is primarily aimed at first- and second-year undergraduates interested in engineering or science, ... Enroll for free.
Course 2: Virtualization, Docker, and Kubernetes for Data Engineering
- Offered by Duke University. Throughout this course, you'll explore virtualization, containerization, and Kubernetes, mastering the very ... Enroll for free.
Course 3: Data Visualization with Python
- Offered by Duke University. In today's data-driven world, the ability to create compelling visualizations and tell impactful stories with ... Enroll for free.
- Offered by Duke University. e.g. This is primarily aimed at first- and second-year undergraduates interested in engineering or science, ... Enroll for free.
Course 2: Virtualization, Docker, and Kubernetes for Data Engineering
- Offered by Duke University. Throughout this course, you'll explore virtualization, containerization, and Kubernetes, mastering the very ... Enroll for free.
Course 3: Data Visualization with Python
- Offered by Duke University. In today's data-driven world, the ability to create compelling visualizations and tell impactful stories with ... Enroll for free.
Courses
-
In today's data-driven world, the ability to create compelling visualizations and tell impactful stories with data is a crucial skill. This comprehensive course will guide you through the process of visualization using coding tools with Python, spreadsheets, and BI (Business Intelligence) tooling. Whether you are a data analyst, a business professional, or an aspiring data storyteller, this course will provide you with the knowledge and best practices to excel in the art of visual storytelling. Throughout the course, a consistent dataset will be used for exercises, enabling you to focus on mastering the visualization tools rather than getting caught up in the intricacies of the data. The emphasis is on practical application, allowing you to learn and practice the tools in a real-world context. To fully leverage the Python sections of this course, prior experience programming in Python is recommended. Additionally, a solid understanding of high-school level math is expected. Familiarity with the Pandas library will also be beneficial. By the end of this course, you will possess the necessary skills to become a proficient data storyteller and visual communicator. With the ability to create compelling visualizations and leverage the appropriate tools, you will be well-equipped to navigate the world of data and make informed decisions that drive meaningful impact.
-
e.g. This is primarily aimed at first- and second-year undergraduates interested in engineering or science, along with high school students and professionals with an interest in programmingGain the skills for building efficient and scalable data pipelines. Explore essential data engineering platforms (Hadoop, Spark, and Snowflake) as well as learn how to optimize and manage them. Delve into Databricks, a powerful platform for executing data analytics and machine learning tasks, while honing your Python data science skills with PySpark. Finally, discover the key concepts of MLflow, an open-source platform for managing the end-to-end machine learning lifecycle, and learn how to integrate it with Databricks. This course is designed for learners who want to pursue or advance their career in data science or data engineering, or for software developers or engineers who want to grow their data management skill set. In addition to the technologies you will learn, you will also gain methodologies to help you hone your project management and workflow skills for data engineering, including applying Kaizen, DevOps, and Data Ops methodologies and best practices. With quizzes to test your knowledge throughout, this comprehensive course will help guide your learning journey to become a proficient data engineer, ready to tackle the challenges of today's data-driven world.
-
Throughout this course, you'll explore virtualization, containerization, and Kubernetes, mastering the very tools that power data engineering in the industry. Each week presents a new set of tools and platforms that are indispensable in data engineering. From mastering Docker and Kubernetes to exploring advanced topics such as AI-driven coding with GitHub Copilot, efficient container image management with Azure and Amazon Elastic Container Registries, and Site Reliability Engineering (SRE) practices, you'll go beyond the basics and acquire the expertise needed to thrive in the dynamic and data-driven landscape of advanced data engineering. Whether you're a current student looking to expand your skills or a working professional aiming to take your expertise to the next level, this course is tailored to equip you with the advanced knowledge and hands-on experience necessary for success.
Taught by
Kennedy Behrman, Matt Harrison and Noah Gift