Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore a groundbreaking conference talk on TiDedup, a novel cluster-level deduplication architecture for Ceph, presented at USENIX ATC '23. Delve into the innovative solutions addressing key shortcomings in Ceph's existing deduplication design, including excessive metadata consumption, serialized tiering mechanism limitations, and inefficient reference count mechanisms. Discover three pioneering schemes introduced by TiDedup: selective cluster-level crawling, an event-driven tiering mechanism with content-defined chunking, and a reference correction method using shared reference back pointers. Learn about the successful integration of TiDedup into the Ceph mainline and its impressive performance results, showcasing up to 34% data reduction on real-world workloads, 50% improvement in foreground I/O throughput during deduplication, and a significant reduction in reference correction scan time by over 50%. Gain valuable insights into this cutting-edge distributed storage system enhancement presented by experts from Samsung Electronics, IBM, Ceph Foundation, and Seoul National University.