The nanoPU - A Nanosecond Network Stack for Datacenters

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!

Grab it

Explore a groundbreaking NIC-CPU co-design called the nanoPU, aimed at accelerating datacenter applications that rely on numerous small Remote Procedure Calls (RPCs) with microsecond-scale processing times. Delve into the innovative fast path that bypasses the cache and memory hierarchy, directly placing incoming messages into the CPU register file. Discover the programmable hardware support for low-latency transport, congestion control, and efficient RPC load balancing across cores. Learn about the hardware-accelerated thread scheduler that makes sub-nanosecond decisions, optimizing CPU utilization and minimizing RPC tail response times. Examine the FPGA prototype built by modifying an open-source RISC-V CPU and evaluate its performance through cycle-accurate simulations on AWS FPGAs. Compare the nanoPU's wire-to-wire RPC response time of just 69ns to commercial NICs and understand how it significantly improves RPC tail response time and system load sustainability. Investigate the implementation and evaluation of applications like MICA, Raft, and Set Algebra for document retrieval, and learn how the nanoPU serves as a high-performance, programmable alternative for one-sided RDMA operations.

Syllabus

Introduction
Trends
The nanoPU
Prototype
Applications
Onesided RDMA
Conclusion

Taught by

USENIX

Reviews

Start your review of The nanoPU - A Nanosecond Network Stack for Datacenters

Taught by

Overload Control for µs-scale RPCs with Breakwater

RingLeader - Efficiently Offloading Intra-Server Orchestration to NICs

Tiara - A Scalable and Efficient Hardware Acceleration Architecture for Stateful Layer-4 Load Balancing

Never Stop Learning.