Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Performance Analysis of Apache Spark and Presto in Cloud Environments

Databricks via YouTube

Overview

Explore an in-depth performance analysis of Apache Spark and Presto in cloud environments through this 37-minute conference talk. Gain valuable insights into the performance and cost considerations of these big data analytics systems running on Amazon EMR, with a special focus on Apache Spark's performance on the Databricks Unified Analytics Platform. Learn about the TPC-DS benchmark results, SQL performance comparisons, and the advantages and disadvantages of each solution. Discover quantitative data and expert analysis to help inform your decision-making process when deploying data analytics at scale, avoiding common pitfalls, and optimizing your cloud-based big data infrastructure.

Syllabus

Intro
About BSC
TPC-DS Benchmark Work
Context and motivation
Systems Under Test (SUTs)
Hardware configuration
Software configuration System Runtime 5.5
Benchmark execution time (base)
Cost-Based Optimizer (CBO) stats
Benchmark execution time (stats)
Speedup with table and column stats
Additional configuration for Presto
TPC-DS Power Test - Query 72
Dynamic data partitioning
Benchmark exec. time (part + stats)
Speedup with partitioning and stats
TPC Benchmark total execution time
TPC Benchmark DS metric
System costs
TPC Benchmark DS cost
TPC-DS price-performance
Usability and developer productivity
Conclusions

Taught by

Databricks

Reviews

Start your review of Performance Analysis of Apache Spark and Presto in Cloud Environments

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.