Databricks is a cloud-based data engineering tool used to process and transform large amounts of data and explore the data through machine learning models. It combines data warehouses & data lakes into a lakehouse architecture.
Data governance is a broad approach that comprises the principles, practices, and tools to manage an organization’s data assets throughout its lifecycle. A data governance strategy allows organizations to make data easily available protecting their data from unauthorized access, and ensuring compliance with regulatory requirements.
This course provides 4 hours of training videos which are segmented into modules. The course concepts are easy to understand through lab demonstrations. In order to test the understanding of learners, every module includes Assessments in the form of Quizzes and In-Video Questions. A mandatory Graded Questions Quiz is also provided at the end of every module.
Candidate should have hands-on knowledge of the Databricks platform with the basic knowledge of AWS services. This course is tailored for professionals seeking to establish a strong foundation in data governance, fraud detection, and prevention strategies. By the end of this course, you will be able to:
-Understand the benefits and features of Databricks on AWS.
-Demonstrate Data Cleansing Pipelines in Databricks.
-Analyze Data Access Control Models and Data Privacy Regulations.
-Elaborate Data Lineage and Data Versions in Databricks Pipelines
Overview
Syllabus
- Introduction to Data Governance with Databricks
- Welcome to Week 1 of Data Governance with Databricks course. This week, you will learn about Introduction to Databricks on AWS. Additionally, you will learn about the benefits and features of Databricks and AWS Integration.
- Data Classification Techniques and Data Quality Management
- This week, we learn Data Classification Techniques including Data Lineage and Impact Analysis and Metadata Management and Data Catalogs. We will also learn about the Data Profiling and Quality Assessment, Data Cleansing Techniques and implement Data Cleansing Pipelines in Databricks.
- Data Privacy and Security
- This week, we will learn about RBAC, Data Access Control Models and Data security policies in Databricks. We will also learn how to implement RBAC in Databricks, and Data Security Best practices.
- Data Governance in Data Pipelines
- This week, we will learn about Data Governance in Data Pipelines including Data Lineage in Data Pipelines, ETL/ELT processes, Data Versiong and Change Data Capture. We will also about Data Governance Best Practices and Tools including Continuous Improvement of Data Governance Processes and implementation and applying best practices in Data goverance.
Taught by
Whizlabs Instructor