Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Tools and Approaches for Migrating Big Datasets to the Cloud

Devoxx via YouTube

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore tools and strategies for migrating large-scale datasets to cloud platforms in this 47-minute Devoxx conference talk. Delve into the experiences of the Hotels.com big data platform team as they tackle the challenges of moving extensive data sets and pipelines from on-premises clusters to cloud-based solutions. Discover two open-source tools developed to overcome unexpected obstacles: Circus Train, a dataset replication tool for copying Hive tables between clusters and clouds, and Waggle Dance, a federated Hive query service enabling data querying across multiple Hive metastores. Learn about the unique features of these tools, their advantages over existing solutions, and how they've been successfully implemented to build a petabyte-scale platform now utilized by other Expedia brands. Gain insights into real-world problems and solutions encountered in a large, organically grown corporation, moving beyond idealized architectures to practical applications in big data migration.

Syllabus

Introduction
Agenda
Company structure
Data processing
Migrating jobs first
Its going to take years
Data search replication
Finding an open source solution
Naming your project
Configuration
Distributed Copy
High of Diff
Other features
Bridging multiple clusters
Waggle Dance
Hive CLI example
Priori pattern
Cloud architecture

Taught by

Devoxx

Reviews

Start your review of Tools and Approaches for Migrating Big Datasets to the Cloud

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.