Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Can Applications Recover from fsync Failures?

USENIX via YouTube

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore the intricacies of fsync failures and their impact on file systems and data-intensive applications in this USENIX ATC '20 conference talk. Delve into a comprehensive analysis of how ext4, XFS, and Btrfs file systems react to fsync failures, uncovering commonalities and differences in their behavior. Examine the failure-handling strategies employed by popular applications like PostgreSQL, LMDB, LevelDB, SQLite, and Redis, and discover why these approaches fall short in preventing catastrophic outcomes such as data loss and corruption. Learn about the implications of these findings for designing file systems and applications that aim to provide robust durability guarantees. Gain insights into the challenges of achieving true data durability and the potential directions for improvement in this critical area of computer science.

Syllabus

Intro
How does data reach the disk?
fsync is really important
It's hard to get durability correct Applications find it difficult
fsync can fail Durability gets harder to get right
Why care about fsync failures? "About a year ago the PostgreSQL community discovered that fsync (on Linux and some BSD systems) may not work the way we always thought it is [sic], with possibly disastrous consequences for data durability/consistency (which is something the PostgreSQL community really values)."
Our work Systematically understand fsync failures
File System Results
Application Results
Outline
File System | Methodology: Fault Injection
File System Methodology: Workloads Common write patterns in applications • Reduced to simplest form
File System Result #1: Clean Pages Dirty page is marked clean after fsync failure on all three file systems
File System Result #22: Page Content File systems do not handle fsync errors uniformly • Page content depends on file system
File System Result #3: In-memory state In-memory data structures are not entirely reverted
Applications Five widely used applications
Applications Results: Overview Ext4 Ordered Mode
Crash/Restart Simple strategies fail Crash/restart is incorrect recovers wrong data from page cache • Example: PostgreSQL
Applications Results #1: False Failures False Failures: Indicate failure but actually succeed
Late Error Reporting All applications susceptible to data loss on ext4 data mode
Btrfs winning?
Applications Results Summary Simple strategies fail • Applications have moved away from retries
Challenges and Directions

Taught by

USENIX

Reviews

Start your review of Can Applications Recover from fsync Failures?

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.