Overview
Syllabus
Intro
Executive Summary
Data Reduction in Storage Systems
Post-deduplication Delta Compression Combines three different data-reduction approaches
Overview of Post-Deduplication Delta Compression
Lossless Compression
Key Challenge: Reference Search How to find a good reference block for an incoming data block across a wide range of stored data at low cost
Limitations of Existing Techniques - Provide significantly lower data-reduction ratios than the optimal
DeepSketch: Key Idea Use the learning-to-hash method for sketch generation A promising machine learning (ML).-based approach for the
DeepSketch: Challenges Lack of semantic information
Data Clustering for DeepSketch . Existing clustering algorithms are unsuitable for DeepSketch
Post-Processing for Training Data Set Non-uniform distribution of data blocks across the clusters
Evaluation Methodology Compared data-reduction techniques
Overall Data-Reduction Benefits
Performance Overhead
Taught by
USENIX