Completed
However, it was troublesome to write MapReduce applications directly Lots of technologies to increase the productivity of application were born. They abstracted MapReduce-based distributed computing.
Class Central Classrooms beta
YouTube videos curated by Class Central.
Classroom Contents
History and Evolution of Data Lake Architecture - Post Lambda Architecture
Automatically move to the next video in the Classroom when playback concludes
- 1 Intro
- 2 Hadoop enabled enterprises to store and process huge data with distributed computing using commodity hardware.
- 3 However, it was troublesome to write MapReduce applications directly Lots of technologies to increase the productivity of application were born. They abstracted MapReduce-based distributed computing.
- 4 Data processing with the low latency
- 5 HBase add a feature to handle the small size of data into Hadoop ecosystem.
- 6 SQL on Hadoop After the distributed computing became popular, various SOL on users developed
- 7 The column-oriented format was getting to be known as a technology to DWH system, as well as Hadoop ecosystem, uses these kinds of formats.
- 8 Traditional requirements for Storage Layer Traditional requirements for Hadoop will continue to be required Scalability
- 9 Use case example that require Real-time Analytics By analyzing the latest activity and accumulated history, it is possible to link useful information to users and store in real time. 1. Accumulate da…
- 10 What are the problems with "Real-time Analytics" architecture? Batch-and stream-focused architecture makes it difficult to meet real-time and diverse analytical requirements Batch architecture
- 11 What are the problems with Lambda Architecture? Lambda Architecture that integrates batch/stream processing makes it difficult to ensure the integrity and increase costs associated with pipeline comp…
- 12 Overview of Delta Lake Storage for transaction management and version control
- 13 Apache Hudi vs Apache Iceberg and Delta Lake Each product has devised a reading method while realizing high-speed writing with a simple method. Apache Hudi
- 14 Apache Iceberg and Delta Lake handle management information in different file structures Apache Iceberg
- 15 Consideration about trade-off Each recent storage layer software has taken various approaches in the direction of balancing the trade-off