The course "Distributed Query Optimization and Security" provides a comprehensive exploration of query optimization and data security in distributed databases. Students will gain in-depth knowledge of how to secure data access through views and dynamic authorization techniques, essential for maintaining the integrity and confidentiality of distributed systems. Learners will also master distributed query processing, understanding how to evaluate, optimize, and implement efficient query plans. The course uniquely blends advanced database security techniques with practical applications of large-scale data systems, such as Hadoop, MapReduce, and HDFS.
By completing this course, learners will be equipped with the skills to optimize complex queries, enhance database security, and handle large datasets effectively. With hands-on experience in MapReduce and HDFS, learners will develop the ability to create scalable, optimized, and secure distributed database systems. This course is ideal for professionals seeking to advance their expertise in database management and distributed systems, with a focus on both performance optimization and data protection.
Overview
Syllabus
- Course Introduction
- This course delves into advanced topics in query processing and data security within distributed databases. Students will learn about semantic data control, including the use of views and authorization techniques, to secure data access. The course also covers distributed query processing, and optimization methods. By understanding these techniques, learners will develop skills to create optimized, secure distributed databases capable of handling complex data queries. This module also introduces another paradigm of large-scale data systems supported by Hadoop, MapReduce and HDFS.
- Semantic Data Control
- This course provides an in-depth exploration of database security, focusing on defining and implementing views for secure data access. Students will learn how to apply dynamic authorization techniques through cascading grant and revoke policies, and specify semantic integrity rules essential for maintaining security and consistency within distributed database systems.
- Distributed Query Processing
- This course delves into the critical aspects of query optimization in databases, focusing on the motivation behind optimizing queries, evaluating the cost-effectiveness of various query plans, and understanding the steps involved in the optimization process. Students will learn to calculate optimization costs and decompose SQL queries into efficient, optimized query trees, enhancing overall database performance.
- Query Optimization and an Introduction to Hadoop, MapReduce and HDFS
- This module delves into distributed query optimization techniques, introducing key methodologies to improve performance in distributed systems using cost models and heuristics for near-optimal query plans. Learners will gain practical skills in query optimization, leveraging the semi-join algorithm. Learners will also gain experience in another popular large-scale data approach embodied in MapReduce supported by the Hadoop Distributed File System (HDFS) to efficiently process large datasets. Through hands-on exercises, students will practice writing HDFS code for data storage and manipulation, focusing on data compression and decompression, and employ advanced MapReduce patterns for optimal data processing..
Taught by
David Silberberg