rocksdb/java
Josh Kang 5db0603613 Read-triggered compactions (#14426)
Summary:
Add read-triggered compaction, a new feature that reduces read amplification by compacting SST files that receive high read traffic. When an SST file's read frequency (`num_reads_sampled / file_size`) exceeds a configurable threshold, it is marked for compaction to a lower level.

The feature introduces two new options: a CF option `read_triggered_compaction_threshold` (default 0, disabled) and a DB option `max_periodic_compaction_trigger_seconds` (default 43200s) that controls how often the background thread re-evaluates compaction scores on quiet databases. Both options are dynamically changeable.

Lowering `max_periodic_compaction_trigger_seconds` does add some overhead, but generally is minimal, so running this every couple of minutes in a production environment seems fairly reasonable.

## Key changes

- **New CF option `read_triggered_compaction_threshold`** (`advanced_options.h`): When positive, files with `reads_per_byte > threshold` are marked for compaction. Files at the last non-empty level are skipped (bottommost compaction handles those separately). Marked files are sorted by hotness (reads_per_byte descending).
- **New DB option `max_periodic_compaction_trigger_seconds`** (`options.h`): Replaces the hardcoded 12-hour ceiling in `ComputeTriggerCompactionPeriod()`. Essential for read-triggered compaction on quiet DBs since there are no writes to trigger score re-evaluation.
- **Leveled compaction picker** (`compaction_picker_level.cc`): Adds read-triggered as the lowest-priority compaction reason in `SetupInitialFiles()`, using the existing `PickFileToCompact` helper.
- **Universal compaction picker** (`compaction_picker_universal.cc`): Adds `PickReadTriggeredCompaction` as lowest priority. Refactors shared "find output level + compute overlapping inputs + create Compaction" logic from both `PickDeleteTriggeredCompaction` and `PickReadTriggeredCompaction` into `BuildCompactionToNextLevel`, handling both single-level and multi-level universal cases.
- **Periodic trigger integration** (`db_impl.cc`): `TriggerPeriodicCompaction` now also fires for CFs with `read_triggered_compaction_threshold > 0`, even without time-based compaction configured.
- **Stress test & db_bench support**: Both `db_stress` and `db_bench` support the new options. `db_crashtest.py` randomly enables read-triggered compaction and sets a short periodic trigger interval when enabled.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/14426

Test Plan:
**Unit tests**:
- `compaction_picker_test` — 7 new tests: `ReadTriggeredCompactionDisabled`, `ReadTriggeredCompactionBelowThreshold`, `ReadTriggeredCompactionAboveThreshold`, `NeedsCompactionReadTriggered`, `ReadTriggeredPicksFile`, `UniversalReadTriggeredCompaction`, `ReadTriggeredSkipsLastLevel`, `UniversalReadTriggeredNoPickWhenNotMarked`
- `db_compaction_test` — `ReadTriggeredCompaction` integration test verifying end-to-end behavior with sync points
- Stress test coverage

**Stress test**:
```
make V=1 -j "CRASH_TEST_EXT_ARGS=--duration=600 --max_key=2500000 --max_compaction_trigger_wakeup_seconds=10
  --read_triggered_compaction_threshold=0.0001 --interval=600" blackbox_crash_test
```
- confirmed read triggered compactions from LOGS

**Benchmark** (`db_bench`):

Setup: 5M keys (100B values, 16B keys), leveled compaction, 5 levels, 4MB target file size. DB fully compacted, then 2M overlapping keys written without compaction to create L0/L1 overlap (82 files, ~294MB).

LSM shape change during readrandom with read-triggered compaction:
```
BEFORE: L0=9 files (15MB), L1=4 (16MB), L2=20 (69MB), L3=49 (194MB) — 82 files, 294MB
AFTER:  L3=66 files (223MB)
```

| Benchmark | Config | avg ops/s | % change |
|-----------|--------|-----------|----------|
| readrandom (8 threads, 5M reads) | baseline (threshold=0) | 1,086,965 | — |
| readrandom (8 threads, 5M reads) | threshold=0.000001, trigger=5s | 1,453,697 | **+33.7%** |

Reviewed By: xingbowang

Differential Revision: D97838716

Pulled By: joshkang97

fbshipit-source-id: a21fcb270c7fadd4f78d98b9c821982f220dd3f0
2026-03-27 14:43:52 -07:00
..
benchmark/src/main/java/org/rocksdb/benchmark Remove deprecated Options::access_hint_on_compaction_start (#11654) 2024-02-05 13:35:19 -08:00
crossbuild Pass build parallelism flag to Docker builds (#12392) 2024-02-28 12:51:00 -08:00
jmh JNI get_helper code sharing / multiGet() use efficient batch C++ support (#12344) 2024-03-12 12:42:08 -07:00
rocksjni Read-triggered compactions (#14426) 2026-03-27 14:43:52 -07:00
samples/src/main/java Remove compressed block cache (#11117) 2023-01-24 17:09:19 -08:00
src Read-triggered compactions (#14426) 2026-03-27 14:43:52 -07:00
CMakeLists.txt Add interpolation search as an alternative to binary search (#14247) 2026-02-13 17:15:10 -08:00
GetPutBenchmarks.md Java API consistency between RocksDB.put() , .merge() and Transaction.put() , .merge() (#11019) 2023-12-11 11:03:17 -08:00
HISTORY-JAVA.md Update JAVA-HISTORY.md for v3.13 2015-08-04 18:12:58 -07:00
jdb_bench.sh Add copyright headers per FB open-source checkup tool. (#5199) 2019-04-18 10:55:01 -07:00
Makefile Add native logger support to RocksJava (#12213) 2024-01-17 17:51:36 -08:00
pmd-rules.xml Java API consistency between RocksDB.put() , .merge() and Transaction.put() , .merge() (#11019) 2023-12-11 11:03:17 -08:00
pom.xml.template Perform java static checks in CI (#11221) 2023-10-17 10:04:35 -07:00
RELEASE.md Add shared library for musl-libc (#3143) 2019-11-26 18:24:09 -08:00
spotbugs-exclude.xml Perform java static checks in CI (#11221) 2023-10-17 10:04:35 -07:00
understanding_options.md New-style blob option bindings, Java option getter and improve/fix option parsing (#8999) 2021-10-19 09:21:52 -07:00