History and Evolution of Data Lake Architecture - Post Lambda Architecture

History and Evolution of Data Lake Architecture - Post Lambda Architecture

Linux Foundation via YouTube Direct link

Intro

1 of 15

1 of 15

Intro

Class Central Classrooms beta

YouTube videos curated by Class Central.

Classroom Contents

History and Evolution of Data Lake Architecture - Post Lambda Architecture

Automatically move to the next video in the Classroom when playback concludes

  1. 1 Intro
  2. 2 Hadoop enabled enterprises to store and process huge data with distributed computing using commodity hardware.
  3. 3 However, it was troublesome to write MapReduce applications directly Lots of technologies to increase the productivity of application were born. They abstracted MapReduce-based distributed computing.
  4. 4 Data processing with the low latency
  5. 5 HBase add a feature to handle the small size of data into Hadoop ecosystem.
  6. 6 SQL on Hadoop After the distributed computing became popular, various SOL on users developed
  7. 7 The column-oriented format was getting to be known as a technology to DWH system, as well as Hadoop ecosystem, uses these kinds of formats.
  8. 8 Traditional requirements for Storage Layer Traditional requirements for Hadoop will continue to be required Scalability
  9. 9 Use case example that require Real-time Analytics By analyzing the latest activity and accumulated history, it is possible to link useful information to users and store in real time. 1. Accumulate da…
  10. 10 What are the problems with "Real-time Analytics" architecture? Batch-and stream-focused architecture makes it difficult to meet real-time and diverse analytical requirements Batch architecture
  11. 11 What are the problems with Lambda Architecture? Lambda Architecture that integrates batch/stream processing makes it difficult to ensure the integrity and increase costs associated with pipeline comp…
  12. 12 Overview of Delta Lake Storage for transaction management and version control
  13. 13 Apache Hudi vs Apache Iceberg and Delta Lake Each product has devised a reading method while realizing high-speed writing with a simple method. Apache Hudi
  14. 14 Apache Iceberg and Delta Lake handle management information in different file structures Apache Iceberg
  15. 15 Consideration about trade-off Each recent storage layer software has taken various approaches in the direction of balancing the trade-off

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.