7 Things I Wish I'd Known Before Building a 60TB Data Warehouse on a Dedicated Pool

Overview

Discover essential insights for building a large-scale data warehouse on Azure Dedicated SQL Pool in this 41-minute SQLBits conference talk. Learn from Steve Powell's experience in constructing a 60TB data warehouse, exploring crucial aspects of Azure Data Factory (ADF), data management, and automation. Gain valuable knowledge about the limitations of backups, hidden features of dedicated pools, and strategies to balance throughput and concurrency. Delve into topics such as ELT frameworks, data loading patterns, HEAP management, and the importance of retry mechanisms. Understand how to optimize concurrency, maintain indexes and statistics, and navigate the challenges of migrating from Netezza. Benefit from practical lessons on Azure Synapse Analytics, database engine management, and successful project delivery in big data environments.

Syllabus

What to expect
Project background
Azure Solution
ELT Framework
Don't move the ETL
Pause/Resume / Scale
Backup/Restore (granular)
Fun with HEAPS
Rebuild your HEAPS
Retry is unavoidable
Retry cos of random errors like these
04: Random flakiness is inherent
05: Idempotent (in Azure Data Factory...)
05: ADF Singleton Pattern
Concurrency is almost everything
Concurrency (one size fits all is bad)
Concurrency (use params in ADF)
Index and Stats Maintenance
Netezza has hidden columns
XX:And the rest