fleet/tools/redis-stress
Carlo b67ab72349
Improvements for Redis stress testing tool (#44411)
Refresh `tools/redis-stress` with cluster-aware modes.
2026-04-29 15:52:59 -04:00
..
race.go Improvements for Redis stress testing tool (#44411) 2026-04-29 15:52:59 -04:00
README.md Improvements for Redis stress testing tool (#44411) 2026-04-29 15:52:59 -04:00
stress.go Improvements for Redis stress testing tool (#44411) 2026-04-29 15:52:59 -04:00
stress_test.go Improvements for Redis stress testing tool (#44411) 2026-04-29 15:52:59 -04:00
write.go Improvements for Redis stress testing tool (#44411) 2026-04-29 15:52:59 -04:00

redis-stress

Cluster-aware Redis stress tool with two modes.

Both modes use Fleet's own redis.NewPool (server/datastore/redis), so cluster topology, redirection handling, and connection routing match what the real Fleet server does in production.

Modes

write — steady SET-only load

Fill a Redis instance (standalone or cluster) at a configurable rate. Each worker writes keys on its own ticker; useful for "occupy the cluster with ongoing writes while I observe something else" or seeding a dataset.

This mode is the cluster-aware successor to the original tools/redis-stress tool. The old subcommand-less invocation (redis-stress -addr=X -wait=10m) still works — when the first arg starts with -, the dispatcher routes to write. The legacy flags -wait, -debug, and -index-start are kept for backward compatibility.

go run ./tools/redis-stress write \
  -addr 127.0.0.1:7001 \
  -workers 5 \
  -rate 100 \
  -duration 1m
Flag Default Purpose
-addr 127.0.0.1:6379 Redis address (cluster startup node OK; cluster auto-detected)
-workers 1 Concurrent SET workers
-rate 1 SETs per worker per second (fractional OK)
-duration 10m Total run time
-key-prefix stress_write_ Key prefix
-key-ttl 10m Per-key expiration
-index-start 0 Starting value of each worker's per-key counter (legacy)
-debug false Log every SET (legacy)
-wait Alias for -duration (legacy)
-cluster-follow-redirects true ClusterFollowRedirections (cluster only)
-cluster-read-from-replica false ClusterReadFromReplica (cluster only)

race — SET-then-GET race detection

Each worker repeatedly does, on fresh pool connections:

conn1 := pool.Get(); conn1.Do("SET", k, v, "PX", ttl); conn1.Close()
conn2 := pool.Get(); conn2.Do("GET", k);                conn2.Close()

and counts any GET that returns nil immediately after a successful SET on the same key. This mirrors how Fleet's server/service/redis_key_value.RedisKeyValue does its Set / Get — fresh connection per call.

# default — cluster mode, reads through ConfigureDoer (i.e., not explicitly
# routed to replicas; same path Fleet uses today)
go run ./tools/redis-stress race -addr 127.0.0.1:7001 -workers 50 -iterations 2000

# explicit replica reads — useful for testing what a deployment with an
# external read-from-replica router (proxy, ElastiCache reader endpoint,
# Redis-Enterprise R/W split) would expose
go run ./tools/redis-stress race -addr 127.0.0.1:7001 \
  -cluster-read-from-replica \
  -explicit-readonly
Flag Default Purpose
-addr 127.0.0.1:7001 Redis cluster startup node
-workers 50 Concurrent SET-then-GET workers
-iterations 1000 Iterations per worker
-ttl 4m PX expiration on SET
-key-prefix stress_race_ Key prefix
-explicit-readonly false Wrap the GET conn with redis.ReadOnlyConn so it's routed to a replica when the pool has ClusterReadFromReplica=true. Without this flag, both SET and GET go to primary.
-cluster-follow-redirects true ClusterFollowRedirections
-cluster-read-from-replica true ClusterReadFromReplica

Output

================ summary ================
elapsed:           11.2s
sets:              100000 (errors 0)
gets:              100000 (errors 0)
nil-after-set:     0  ← the bug
stale-after-set:   0
ops/sec:           17858.2

nil-after-set is the metric to watch. Any non-zero value means the cluster served a GET for a key the same code had just SET and gotten an OK acknowledgement for. The tool exits with status 1 when this happens.

stale-after-set should be zero in normal operation (the key namespace is worker-and-iteration-specific) and is included as a defense against unexpected key collisions.

Bringing up a local Redis cluster

The repo includes a 6-node Redis Cluster compose file. From the repo root:

docker compose -f docker-compose.yml -f docker-compose-redis-cluster.yml up -d \
  redis-cluster-1 redis-cluster-2 redis-cluster-3 \
  redis-cluster-4 redis-cluster-5 redis-cluster-6 \
  redis-cluster-setup

Verify the cluster came up:

docker exec fleet-redis-cluster-1-1 redis-cli -p 7001 cluster info | grep cluster_state
# cluster_state:ok

On macOS, host-to-cluster networking requires docker-mac-net-connect:

brew install chipmk/tap/docker-mac-net-connect
sudo brew services start chipmk/tap/docker-mac-net-connect

Without it, only port-forwarded nodes are reachable and any cluster redirect times out.

Forcing a replica to lag

To verify the race detector works mechanically (or to simulate a customer deployment where reads can fall behind writes), pause one of the replica nodes mid-test:

docker pause fleet-redis-cluster-4-1
go run ./tools/redis-stress race \
  -addr 127.0.0.1:7001 -explicit-readonly -workers 20 -iterations 500
docker unpause fleet-redis-cluster-4-1

With -explicit-readonly and a paused replica, you should see non-zero nil-after-set events for keys whose slot's replica was the paused one.