MSRL - Distributed Reinforcement Learning with Dataflow Fragments

Overview

Coursera Plus Annual Sale: All Certificates & Courses 25% Off!

Grab it

Explore a conference talk from USENIX ATC '23 that introduces MSRL, a novel distributed reinforcement learning (RL) training system. Discover how MSRL utilizes the concept of fragmented dataflow graphs (FDGs) to execute RL algorithms flexibly on GPU clusters. Learn about the challenges in current RL systems and how MSRL addresses them by decoupling algorithm definition from distributed execution strategies. Understand the benefits of FDGs in handling diverse RL algorithms, allowing fragments to execute on different devices through various low-level dataflow implementations. Gain insights into how MSRL's distribution policy enables efficient mapping of fragments to devices without altering the RL algorithm implementation. Examine the experimental results demonstrating MSRL's ability to expose trade-offs between execution strategies while outperforming existing RL systems with fixed strategies.

Syllabus

USENIX ATC '23 - MSRL: Distributed Reinforcement Learning with Dataflow Fragments

Taught by

USENIX

Reviews

Start your review of MSRL - Distributed Reinforcement Learning with Dataflow Fragments

Taught by

AWARE - Automate Workload Autoscaling with Reinforcement Learning in Production Cloud Systems

Cyclosa - Redundancy-Free Graph Pattern Mining via Set Dataflow

Distributed Transactions at Scale in Amazon DynamoDB

TiDedup - A New Distributed Deduplication Architecture for Ceph

Distributed Trust - Is “Blockchain” the Answer?

Bridging the Gap between QoE and QoS in Congestion Control - A Large-scale Mobile Web Service Perspective

Never Stop Learning.