- Module 1: Understand big data engineering with Apache Spark in Azure Synapse Analytics
- Differentiate between Apache Spark and Spark pools
- Differentiate between Azure Databricks and Spark pools
- Differentiate between HDInsight and Spark Pools
- Differentiate between Spark Pools and SQL Pools
- Understand the use-cases of data engineering with Apache Spark in Azure Synapse analytics
- Create a Spark pool in Azure Synapse Analytics
- Module 2: Ingest data with Apache Spark notebooks in Azure Synapse Analytics
- Understand the use-cases for Spark Notebooks
- Create a Spark Notebook in Azure Synapse Analytics
- Understand the supported languages in Spark Notebooks
- Develop Spark Notebooks
- Run Spark Notebooks
- Load data in Spark Notebooks
- Save Spark Notebooks
- Module 3: Transform data with DataFrames in Apache Spark Pools in Azure Synapse Analytics
- Understand DataFrames in Spark Pools in Azure Synapse Analytics
- Load data into a Spark DataFrame
- Create a Spark table
- Write Data to and from a storage account
- Load a streaming DataFrame into Apache Spark
- Flatten nested structures and explode arrays with Apache Spark
- Module 4: Integrate SQL and Apache Spark pools in Azure Synapse Analytics
- Describe the integration methods between SQL and Spark Pools in Azure Synapse Analytics
- Understand the use-cases for SQL and Spark Pools integration
- Authenticate in Azure Synapse Analytics
- Transfer data between SQL and Spark Pool in Azure Synapse Analytics
- Authenticate between Spark and SQL Pool in Azure Synapse Analytics
- Integrate SQL and Spark Pools in Azure Synapse Analytics
- Externalize the use of Spark Pools within Azure Synapse workspace
- Transfer data outside the Synapse workspace using SQL Authentication
- Transfer data outside the Synapse workspace using the PySpark Connector
- Transform data in Apache Spark and write back to SQL Pool in Azure Synapse Analytics
- Module 5: Monitor and manage data engineering workloads with Apache Spark in Azure Synapse Analytics
- Monitor Spark Pools in Azure Synapse Analytics
- Understand Resource Utilization of Spark Pools in Azure Synapse Analytics
- Monitor Query activity of Spark Pools in Azure Synapse Analytics
- Base-line Apache Spark performance with Apache Spark History Server in Azure Synapse Analytics
- Optimize Apache Spark jobs in Azure Synapse Analytics
- Automate scaling of Apache Spark pools in Azure Synapse Analytics
After completing this module, you will be able to:
After completing this module, you will be able to:
After completing this module, you will be able to:
After completing this module, you will be able to:
After completing this module, you will be able to: