mirror of
https://github.com/facebook/rocksdb
synced 2026-05-24 09:29:21 +00:00
Summary: Add read-triggered compaction, a new feature that reduces read amplification by compacting SST files that receive high read traffic. When an SST file's read frequency (`num_reads_sampled / file_size`) exceeds a configurable threshold, it is marked for compaction to a lower level. The feature introduces two new options: a CF option `read_triggered_compaction_threshold` (default 0, disabled) and a DB option `max_periodic_compaction_trigger_seconds` (default 43200s) that controls how often the background thread re-evaluates compaction scores on quiet databases. Both options are dynamically changeable. Lowering `max_periodic_compaction_trigger_seconds` does add some overhead, but generally is minimal, so running this every couple of minutes in a production environment seems fairly reasonable. ## Key changes - **New CF option `read_triggered_compaction_threshold`** (`advanced_options.h`): When positive, files with `reads_per_byte > threshold` are marked for compaction. Files at the last non-empty level are skipped (bottommost compaction handles those separately). Marked files are sorted by hotness (reads_per_byte descending). - **New DB option `max_periodic_compaction_trigger_seconds`** (`options.h`): Replaces the hardcoded 12-hour ceiling in `ComputeTriggerCompactionPeriod()`. Essential for read-triggered compaction on quiet DBs since there are no writes to trigger score re-evaluation. - **Leveled compaction picker** (`compaction_picker_level.cc`): Adds read-triggered as the lowest-priority compaction reason in `SetupInitialFiles()`, using the existing `PickFileToCompact` helper. - **Universal compaction picker** (`compaction_picker_universal.cc`): Adds `PickReadTriggeredCompaction` as lowest priority. Refactors shared "find output level + compute overlapping inputs + create Compaction" logic from both `PickDeleteTriggeredCompaction` and `PickReadTriggeredCompaction` into `BuildCompactionToNextLevel`, handling both single-level and multi-level universal cases. - **Periodic trigger integration** (`db_impl.cc`): `TriggerPeriodicCompaction` now also fires for CFs with `read_triggered_compaction_threshold > 0`, even without time-based compaction configured. - **Stress test & db_bench support**: Both `db_stress` and `db_bench` support the new options. `db_crashtest.py` randomly enables read-triggered compaction and sets a short periodic trigger interval when enabled. Pull Request resolved: https://github.com/facebook/rocksdb/pull/14426 Test Plan: **Unit tests**: - `compaction_picker_test` — 7 new tests: `ReadTriggeredCompactionDisabled`, `ReadTriggeredCompactionBelowThreshold`, `ReadTriggeredCompactionAboveThreshold`, `NeedsCompactionReadTriggered`, `ReadTriggeredPicksFile`, `UniversalReadTriggeredCompaction`, `ReadTriggeredSkipsLastLevel`, `UniversalReadTriggeredNoPickWhenNotMarked` - `db_compaction_test` — `ReadTriggeredCompaction` integration test verifying end-to-end behavior with sync points - Stress test coverage **Stress test**: ``` make V=1 -j "CRASH_TEST_EXT_ARGS=--duration=600 --max_key=2500000 --max_compaction_trigger_wakeup_seconds=10 --read_triggered_compaction_threshold=0.0001 --interval=600" blackbox_crash_test ``` - confirmed read triggered compactions from LOGS **Benchmark** (`db_bench`): Setup: 5M keys (100B values, 16B keys), leveled compaction, 5 levels, 4MB target file size. DB fully compacted, then 2M overlapping keys written without compaction to create L0/L1 overlap (82 files, ~294MB). LSM shape change during readrandom with read-triggered compaction: ``` BEFORE: L0=9 files (15MB), L1=4 (16MB), L2=20 (69MB), L3=49 (194MB) — 82 files, 294MB AFTER: L3=66 files (223MB) ``` | Benchmark | Config | avg ops/s | % change | |-----------|--------|-----------|----------| | readrandom (8 threads, 5M reads) | baseline (threshold=0) | 1,086,965 | — | | readrandom (8 threads, 5M reads) | threshold=0.000001, trigger=5s | 1,453,697 | **+33.7%** | Reviewed By: xingbowang Differential Revision: D97838716 Pulled By: joshkang97 fbshipit-source-id: a21fcb270c7fadd4f78d98b9c821982f220dd3f0 |
||
|---|---|---|
| .. | ||
| benchmark/src/main/java/org/rocksdb/benchmark | ||
| crossbuild | ||
| jmh | ||
| rocksjni | ||
| samples/src/main/java | ||
| src | ||
| CMakeLists.txt | ||
| GetPutBenchmarks.md | ||
| HISTORY-JAVA.md | ||
| jdb_bench.sh | ||
| Makefile | ||
| pmd-rules.xml | ||
| pom.xml.template | ||
| RELEASE.md | ||
| spotbugs-exclude.xml | ||
| understanding_options.md | ||