ring/dev-team/skills/dev-systemplane-migration/SKILL.md
Fred Amaral 9cb5a72737
fix(codereview): align reviewer references and harden lib-commons/multi-tenant agents
Propagates the 10-reviewer peer list across agent frontmatter, Position/Critical prose, shared-patterns, skill dispatchers, gate validators, and docs — resolving drift left behind when multi-tenant-reviewer and lib-commons-reviewer were added to the pool. Also fixes broken shared-pattern paths in lib-commons-reviewer and adds substantive blocker criteria to multi-tenant-reviewer plus codebase-context severity heuristic (Lerian third-rail vs external recommendation) to lib-commons-reviewer.

X-Lerian-Ref: 0x1
2026-04-18 20:18:16 -03:00

121 KiB

name description trigger skip_when NOT_skip_when prerequisites sequence input_schema output_schema
ring:dev-systemplane-migration Gate-based systemplane migration orchestrator. Migrates Lerian Go services from .env/YAML-based configuration to the systemplane — a database-backed, hot-reloadable runtime configuration and settings management plane with full audit history, optimistic concurrency, change feeds, component-granular bundle rebuilds, and atomic infrastructure replacement. Requires lib-commons v4.3.0+. - User requests systemplane migration or adoption - Task mentions runtime configuration, hot-reload, or config management - Service needs database-backed configuration with audit trail - BundleFactory or Reconciler development is required - Project is not a Go service - Service does not use lib-commons v4 - Task is unrelated to configuration management or systemplane - Service already has systemplane code (verify compliance, do not skip) - "It looks like systemplane is already set up" (existence ≠ compliance) - Go project - lib-commons/v4 dependency (v4.3.0+ required; upgrade first if older) - PostgreSQL or MongoDB backend available
after
ring:dev-cycle
type properties
object
execution_mode detected_backend detected_dependencies service_has_workers existing_systemplane
type enum description
string
FULL
SCOPED
FULL = fresh migration. SCOPED = compliance audit of existing.
type enum
string
postgres
mongodb
type items description
array
type
string
Infrastructure detected: postgres, mongodb, redis, rabbitmq, s3
type
boolean
type
boolean
format required_sections
markdown
name pattern required
Migration Summary ^## Migration Summary true
name pattern required
Gates Completed ^## Gates Completed true
name pattern required
Compliance Status ^## Compliance Status true
name pattern required
Files Created ^## Files Created true

Systemplane Migration Orchestrator

<cannot_skip>

CRITICAL: This Skill ORCHESTRATES. Agents IMPLEMENT.

Who Responsibility
This Skill Detect stack, determine gates, pass context to agent, verify outputs, enforce order
ring:backend-engineer-golang Implement systemplane code following the patterns in this document
ring:codebase-explorer Analyze the codebase for configuration patterns (Gate 1)
ring:visualize Generate implementation preview HTML (Gate 1.5)
10 reviewers Review at Gate 8

CANNOT change scope: the skill defines WHAT to implement. The agent implements HOW.

FORBIDDEN: Orchestrator MUST NOT use Edit, Write, or Bash tools to modify source code files. All code changes MUST go through Task(subagent_type="ring:backend-engineer-golang"). The orchestrator only verifies outputs (grep, go build, go test) — MUST NOT write implementation code.

</cannot_skip>


Architecture Overview

The systemplane replaces traditional env-var-only / YAML-based configuration with a three-tier architecture:

┌─ TIER 1: Bootstrap-Only ──────────────────────────────────────────┐
│  Env vars read ONCE at startup. Immutable for process lifetime.   │
│  Examples: SERVER_ADDRESS, AUTH_ENABLED, OTEL_* (telemetry)       │
│  Stored in: BootstrapOnlyConfig struct (frozen at init)           │
└───────────────────────────────────────────────────────────────────┘

┌─ TIER 2: Runtime-Managed (Hot-Reload) ────────────────────────────┐
│  Stored in database (PostgreSQL or MongoDB).                      │
│  Changed via PATCH /v1/system/configs or /v1/system/settings.     │
│  Propagation: ChangeFeed → Supervisor → Snapshot → Bundle/Reconcile│
│  Examples: rate limits, worker intervals, DB pool sizes, CORS     │
└───────────────────────────────────────────────────────────────────┘

┌─ TIER 3: Live-Read (Zero-Cost Per-Request) ───────────────────────┐
│  Read directly from Supervisor.Snapshot() on every request.       │
│  No rebuild, no reconciler, no locking.                           │
│  Examples: rate_limit.max, health_check_timeout_sec               │
└───────────────────────────────────────────────────────────────────┘

Data Flow: Startup → Hot-Reload → Shutdown

STARTUP:
  ENV VARS → defaultConfig() → loadConfigFromEnv() → *Config → ConfigManager
  InitSystemplane():
    1. ExtractBootstrapOnlyConfig(cfg)         → BootstrapOnlyConfig (immutable)
    2. LoadSystemplaneBackendConfig(cfg)        → BootstrapConfig (PG/Mongo DSN)
    3. builtin.NewBackendFromConfig(ctx, cfg)   → Store + History + ChangeFeed
    4. registry.New() + Register{Service}Keys() → Registry (100+ key definitions)
    5. configureBackendWithRegistry()            → pass secret keys + apply behaviors
    6. NewSnapshotBuilder(registry, store)       → SnapshotBuilder
    7. New{Service}BundleFactory(bootstrapCfg)   → BundleFactory (full + incremental)
    8. seedStoreForInitialReload()               → Env overrides → Store
    9. buildReconcilers()                        → [HTTP, Publisher, Worker] (phased)
   10. NewSupervisor → Reload("initial-bootstrap") → First bundle
   11. NewManager(ManagerConfig{                  → HTTP API handler
           ConfigWriteValidator: productionGuards,
           StateSync: configManagerSync,
       })
  StartChangeFeed() → DebouncedFeed(200ms) → goroutine: subscribe(store changes)
  MountSystemplaneAPI() → 9 endpoints on /v1/system/*

HOT-RELOAD (on API PATCH or ChangeFeed signal):
  Signal → Supervisor.Reload(ctx, reason) →
    1. SnapshotBuilder.BuildFull() → new Snapshot (defaults + store overrides)
    2. BundleFactory.BuildIncremental(snap, prev, prevSnap) →
       a) Diff changed keys via keyComponentMap (postgres/redis/rabbitmq/s3/http/logger)
       b) Rebuild ONLY changed components
       c) Reuse unchanged components from previous bundle (pointer transfer)
       d) Falls back to full Build() when all components are affected
    3. Reconcilers run IN PHASE ORDER:
       a) PhaseStateSync:    (reserved — no current reconcilers)
       b) PhaseValidation:   HTTPPolicy, Publisher
       c) PhaseSideEffect:   Worker → WorkerManager.ApplyConfig()
    4. On success: atomic swap — snapshot.Store() + bundle.Store()
    5. AdoptResourcesFrom(previous) → nil-out transferred pointers
    6. Observer callback → bundleState.Update() + ConfigManager.UpdateFromSystemplane()
       + SwappableLogger.Swap() + SwapRuntimePublishers()
    7. previous.Close() → only tears down REPLACED components

SHUTDOWN:
  1. ConfigManager.Stop()         (prevent mutations)
  2. cancelChangeFeed()           (stop reload triggers)
  3. Supervisor.Stop()            (stop supervisory loop + close bundle)
  4. Backend.Close()              (close store connection)
  5. WorkerManager.Stop()         (stop all workers)

The Three Configuration Authorities

Phase Authority Scope Mutability
Bootstrap Env vars → defaultConfig() + loadConfigFromEnv() Server address, TLS, auth, telemetry Immutable after startup
Runtime Systemplane Store + Supervisor Rate limits, workers, timeouts, DB pools, CORS Hot-reloadable via API
Legacy bridge ConfigManager.Get() Backward-compat for existing code Updated by StateSync callback + observer

Single source of truth: {service}KeyDefs() is THE canonical source of all default values. The defaultConfig() function derives its values from KeyDefs via defaultSnapshotFromKeyDefs()configFromSnapshot(). No manual sync required between defaults, key definitions, or struct tags.

Component-granular awareness: Every key's Component field (e.g., "postgres", "redis", "rabbitmq", "s3", "http", "logger", or ComponentNone) enables the IncrementalBundleFactory to rebuild only the affected infrastructure component when that key changes, instead of tearing down and rebuilding everything.


Canonical Import Paths

The systemplane is a shared library in lib-commons (since v4.3.0). All services import it via:

import (
    "github.com/LerianStudio/lib-commons/v4/commons/systemplane/domain"
    "github.com/LerianStudio/lib-commons/v4/commons/systemplane/ports"
    "github.com/LerianStudio/lib-commons/v4/commons/systemplane/registry"
    "github.com/LerianStudio/lib-commons/v4/commons/systemplane/service"
    "github.com/LerianStudio/lib-commons/v4/commons/systemplane/bootstrap"
    "github.com/LerianStudio/lib-commons/v4/commons/systemplane/bootstrap/builtin"
    "github.com/LerianStudio/lib-commons/v4/commons/systemplane/adapters/changefeed"
    fiberhttp "github.com/LerianStudio/lib-commons/v4/commons/systemplane/adapters/http/fiber"

    // Swagger spec (for consuming app integration)
    systemswagger "github.com/LerianStudio/lib-commons/v4/commons/systemplane/swagger"
)

Requires lib-commons v4.3.0+ in your go.mod.

HARD GATE: Agent must not use v2 or v3 import paths or invent sub-package paths. If any tool truncates output, this table is the authoritative reference.


Severity Calibration

Severity Criteria Examples
CRITICAL Data loss, runtime panic, silent config corruption Bundle not closed on shutdown (resource leak), snapshot reads nil (panic), ChangeFeed not started (hot-reload broken)
HIGH Missing integration, broken hot-reload path Keys registered but no BundleFactory, no HTTP Mount, no StateSync callback
MEDIUM Incomplete but functional Missing validators on some keys, incomplete secret redaction, no swagger merge
LOW Polish and optimization Missing config-map.example, incomplete descriptions, suboptimal component grouping

MUST report all severities. CRITICAL: STOP immediately. HIGH: Fix before gate pass. MEDIUM: Fix in iteration. LOW: Document.


Pressure Resistance

User Says This Is Response
"Keys are registered, migration is done" SCOPE_REDUCTION "Key registration is Gate 2 of 10. Without BundleFactory + Wiring + HTTP Mount, keys are metadata — nothing reads them at runtime. MUST complete ALL gates."
"We can wire it later" SCOPE_REDUCTION "Gate 7 is where the system comes alive. Without it, systemplane is dead code. Gate 7 is NEVER skippable."
"Swagger integration is cosmetic" QUALITY_BYPASS "Operators need API discoverability. Without swagger.MergeInto(), 9 endpoints are invisible in Swagger UI. MUST implement."
"Config bridge is backward compat only" SCOPE_REDUCTION "Existing code reads Config struct. Without StateSync, it reads stale bootstrap values forever. MUST implement Gate 6."
"Reconcilers are optional" SCOPE_REDUCTION "Without reconcilers, workers and HTTP policies ignore config changes. Hot-reload is partial."
"The service already has systemplane" COMPLIANCE_BYPASS "Existence ≠ compliance. MUST run compliance audit. If it doesn't match canonical patterns exactly, it is non-compliant."
"Skip code review, we tested it" QUALITY_BYPASS "MANDATORY: 10 reviewers. One wiring mistake = silent config corruption or resource leak."
"Agent says out of scope" AUTHORITY_OVERRIDE "Skill defines scope, not agent. Re-dispatch with gate context."
"ChangeFeed can be added later" SCOPE_REDUCTION "Without ChangeFeed, hot-reload is broken. Config changes in DB are invisible to the service. MUST start DebouncedFeed."
"Active bundle state is internal detail" SCOPE_REDUCTION "Without thread-safe accessor, request handlers cannot read live config. Race conditions. MUST implement active_bundle_state.go."

Gate Overview

Gate Name Condition Agent
0 Stack Detection + Prerequisite Audit Always Orchestrator
1 Codebase Analysis (Config Focus) Always ring:codebase-explorer
1.5 Implementation Preview Always Orchestrator (ring:visualize)
2 Key Definitions + Registry Always ring:backend-engineer-golang
3 Bundle + BundleFactory Always ring:backend-engineer-golang
4 Reconcilers Conditional (skip if no workers AND no RMQ AND no HTTP policy changes) ring:backend-engineer-golang
5 Identity + Authorization Always ring:backend-engineer-golang
6 Config Manager Bridge Always ring:backend-engineer-golang
7 Wiring + HTTP Mount + Swagger + ChangeFeed Always NEVER SKIPPABLE ring:backend-engineer-golang
8 Code Review Always 10 parallel reviewers
9 User Validation Always User
10 Activation Guide Always Orchestrator

MUST execute gates sequentially. CANNOT skip or reorder.

Gate Execution Rules

HARD GATES — CANNOT be overridden:

  • All gates must execute in order (0→1→1.5→2→3→4→5→6→7→8→9→10)
  • Gates execute with explicit dependencies — no gate can start until its predecessor completes
  • Existence ≠ compliance: existing systemplane code triggers compliance audit, NOT a skip
  • A gate can only be marked SKIP when ALL its compliance checks pass with evidence
  • Gate 7 (Wiring) is NEVER skippable — it is the most critical gate

<cannot_skip>

HARD GATE: Existence ≠ Compliance

"The service already has systemplane code" is NOT a reason to skip any gate.

MUST replace existing systemplane code that does not follow the canonical patterns — it is non-compliant. The only valid reason to skip a gate is when the existing implementation has been verified to match the exact patterns defined in this skill document.

Compliance verification requires EVIDENCE, not assumption. The Gate 0 Phase 2 compliance audit (S1-S8 grep checks) verifies each component against canonical patterns.

If ANY audit check is NON-COMPLIANT → the corresponding gate MUST execute to fix it. CANNOT skip.

</cannot_skip>


Gate 0: Stack Detection + Prerequisite Audit

Orchestrator executes directly. No agent dispatch.

This gate has THREE phases: detection, compliance audit, and non-canonical pattern detection.

Phase 1: Stack Detection

DETECT (run in parallel):

1. lib-commons version:
   grep "lib-commons" go.mod
   (require v4.3.0+ — if older, MUST upgrade before proceeding)

2. PostgreSQL:
   grep -rn "postgresql\|pgx\|squirrel" internal/ go.mod

3. MongoDB:
   grep -rn "mongodb\|mongo" internal/ go.mod

4. Redis:
   grep -rn "redis\|valkey" internal/ go.mod

5. RabbitMQ:
   grep -rn "rabbitmq\|amqp" internal/ go.mod

6. S3/Object Storage:
   grep -rn "s3\|ObjectStorage\|PutObject\|GetObject\|Upload.*storage\|Download.*storage" internal/ pkg/ go.mod

7. Background workers:
   grep -rn "ticker\|time.NewTicker\|cron\|worker\|scheduler" internal/ --include="*.go" | grep -v _test.go

8. Existing systemplane code:
   - Imports:       grep -rn "systemplane" internal/ go.mod
   - BundleFactory: grep -rn "BundleFactory\|IncrementalBundleFactory" internal/ --include="*.go"
   - Supervisor:    grep -rn "service.NewSupervisor\|Supervisor" internal/ --include="*.go" | grep systemplane
   - Keys:          grep -rn "Register.*Keys\|KeyDefs()" internal/ --include="*.go"
   - HTTP Mount:    grep -rn "handler.Mount\|fiberhttp.NewHandler" internal/ --include="*.go"

9. Current config pattern:
   - Struct:     grep -rn "type Config struct" internal/ --include="*.go"
   - Env tags:   grep -rn "envDefault:" internal/ --include="*.go" | sort
   - Env reads:  grep -rn "os.Getenv\|viper\.\|envconfig" internal/ --include="*.go" | grep -v _test.go
   - YAML files: find . -name '.env*' -o -name '*.yaml' -o -name '*.yml' | grep -i config

Output format for stack detection:

STACK DETECTION RESULTS:
| Component | Detected | Evidence |
|-----------|----------|----------|
| lib-commons v4.3.0+ | YES/NO | {go.mod version} |
| PostgreSQL | YES/NO | {file:line matches} |
| MongoDB | YES/NO | {file:line matches} |
| Redis | YES/NO | {file:line matches} |
| RabbitMQ | YES/NO | {file:line matches} |
| S3/Object Storage | YES/NO | {file:line matches} |
| Background workers | YES/NO | {file:line matches} |
| Existing systemplane | YES/NO | {file:line matches} |
| Config pattern | envDefault/viper/os.Getenv | {file:line matches} |

Phase 2: Compliance Audit (MANDATORY if any existing systemplane code detected)

If Phase 1 step 8 detects any existing systemplane code, MUST run a compliance audit. MUST replace existing code that does not match canonical patterns — it is not "partially done", it is wrong.

AUDIT (run in parallel — only if step 8 found existing systemplane code):

S1. Key Registry compliance:
    grep -rn "Register.*Keys\|KeyDefs()" internal/ --include="*.go"
    (no match = NON-COMPLIANT → Gate 2 MUST fix)

S2. BundleFactory compliance:
    grep -rn "IncrementalBundleFactory\|BundleFactory" internal/ --include="*.go"
    (no IncrementalBundleFactory = NON-COMPLIANT → Gate 3 MUST fix)

S3. AdoptResources compliance:
    grep -rn "AdoptResourcesFrom" internal/ --include="*.go"
    (no match but BundleFactory exists = NON-COMPLIANT → Gate 3 MUST fix)

S4. Reconciler compliance:
    grep -rn "BundleReconciler\|Reconcile.*context" internal/ --include="*.go" | grep -v _test.go
    (expected reconcilers missing = NON-COMPLIANT → Gate 4 MUST fix)

S5. Identity/Authorization compliance:
    grep -rn "IdentityResolver\|Authorizer" internal/ --include="*.go"
    (no match = NON-COMPLIANT → Gate 5 MUST fix)

S6. Config Bridge compliance:
    grep -rn "StateSync\|config_manager_systemplane\|configFromSnapshot" internal/ --include="*.go"
    (no StateSync callback or config hydration = NON-COMPLIANT → Gate 6 MUST fix)

S7. HTTP Mount compliance:
    grep -rn "handler.Mount\|fiberhttp.NewHandler" internal/ --include="*.go"
    (no match = NON-COMPLIANT → Gate 7 MUST fix)
    Also check all 9 routes are accessible:
    grep -rn "/v1/system" internal/ --include="*.go"

S8. Swagger compliance:
    grep -rn "swagger.MergeInto\|systemswagger" internal/ --include="*.go"
    (no match = NON-COMPLIANT → Gate 7 MUST fix)

S9. File structure compliance:
    ls internal/bootstrap/systemplane_mount.go
    ls internal/bootstrap/systemplane_authorizer.go
    ls internal/bootstrap/systemplane_identity.go
    ls internal/bootstrap/systemplane_factory.go
    ls internal/bootstrap/active_bundle_state.go
    (Each systemplane concern MUST have a dedicated file. Monolithic systemplane.go
     files that combine mount + authorizer + identity = NON-COMPLIANT)

Output format for compliance audit:

COMPLIANCE AUDIT RESULTS:
| Check | Component | Status | Evidence | Gate Action |
|-------|-----------|--------|----------|-------------|
| S1 | Key Registry | COMPLIANT / NON-COMPLIANT | {grep results} | Gate 2: SKIP / MUST FIX |
| S2 | BundleFactory | COMPLIANT / NON-COMPLIANT | {grep results} | Gate 3: SKIP / MUST FIX |
| S3 | AdoptResources | COMPLIANT / NON-COMPLIANT | {grep results} | Gate 3: SKIP / MUST FIX |
| S4 | Reconcilers | COMPLIANT / NON-COMPLIANT | {grep results} | Gate 4: SKIP / MUST FIX |
| S5 | Identity/Auth | COMPLIANT / NON-COMPLIANT | {grep results} | Gate 5: SKIP / MUST FIX |
| S6 | Config Bridge | COMPLIANT / NON-COMPLIANT | {grep results} | Gate 6: SKIP / MUST FIX |
| S7 | HTTP Mount | COMPLIANT / NON-COMPLIANT | {grep results} | Gate 7: SKIP / MUST FIX |
| S8 | Swagger | COMPLIANT / NON-COMPLIANT | {grep results} | Gate 7: SKIP / MUST FIX |
| S9 | File Structure | COMPLIANT / NON-COMPLIANT | {ls results} | Gate 7: SKIP / MUST FIX |

HARD GATE: A gate can only be marked as SKIP when ALL its compliance checks are COMPLIANT with evidence. One NON-COMPLIANT row → gate MUST execute.

Phase 3: Non-Canonical Pattern Detection (MANDATORY)

MUST scan for custom config management outside the systemplane canonical patterns:

DETECT non-canonical config patterns:

N1. Custom config hot-reload:
    grep -rln "hot.reload\|config.watch\|config.reload\|ConfigReload" internal/config/ pkg/config/ internal/bootstrap/ --include="*.go" 2>/dev/null
    (any match = NON-CANONICAL → MUST be removed and replaced with systemplane ChangeFeed)

N2. Custom file watchers:
    grep -rn "fsnotify\|viper.WatchConfig\|inotify\|file.watch" internal/ pkg/ --include="*.go"
    (any match = NON-CANONICAL → MUST be removed; systemplane uses database change feed)

N3. Custom change notification channels:
    grep -rn "chan.*config\|config.*chan\|ConfigChange\|configChanged" internal/ pkg/ --include="*.go" | grep -v systemplane
    (any match outside systemplane = NON-CANONICAL → MUST be replaced with systemplane ChangeFeed)

If non-canonical files are found: report them in the compliance audit as NON-CANONICAL FILES DETECTED. The implementing agent MUST remove these files and replace their functionality with systemplane during the appropriate gate.

Store detection results:

{
  "skill": "ring:dev-systemplane-migration",
  "gate": "0",
  "detection": {
    "lib_commons_version": "v4.3.2",
    "backend": "postgres",
    "dependencies": ["postgres", "redis", "rabbitmq"],
    "has_workers": true,
    "existing_systemplane": false,
    "config_pattern": "envDefault",
    "config_struct_location": "internal/bootstrap/config.go:15"
  },
  "compliance": {
    "status": "NEW",
    "checks": {}
  },
  "non_canonical": []
}

MUST confirm: user explicitly approves detection results before proceeding.


Gate 1: Codebase Analysis (Config Focus)

Always executes. This gate builds the configuration inventory for all subsequent gates.

Dispatch ring:codebase-explorer with systemplane-focused context:

TASK: Analyze this codebase exclusively under the systemplane migration perspective. DETECTED STACK: {databases and infrastructure from Gate 0}

FOCUS AREAS (explore ONLY these — ignore everything else):

  1. Config struct location and fields: Find the main Config struct, all env tags (envDefault:), default values. List every field with its current type, default, and env var name. Include nested structs (e.g., Config.Postgres.Host).

  2. Environment variable reads: All os.Getenv, envconfig, viper usage. Include those outside the Config struct (ad-hoc reads in business logic).

  3. Infrastructure client creation: How postgres, redis, rabbitmq, mongo, S3 clients are created. File:line for each constructor call. Connection strings, pool sizes, timeouts used. These become Bundle components.

  4. Background workers: Ticker-based, cron-based, goroutine workers that need reconciliation. File:line for each worker start. Which config fields control their behavior (intervals, enable/disable, batch sizes).

  5. HTTP server configuration: Listen address, TLS, CORS, rate limits, timeouts, body limits. Where these are read and applied. These become HTTP policy keys.

  6. Authentication/authorization: JWT parsing, middleware, permission model. Where user ID and tenant ID are extracted from context. This drives Identity + Authorizer implementation.

  7. Existing config reload patterns: Any hot-reload, file watching, config refresh mechanisms. Viper watchers, fsnotify, custom channels. These will be replaced by systemplane ChangeFeed.

  8. Swagger/OpenAPI setup: How swagger is generated (swag init command?), where spec is served (middleware?), what tool (swaggo/swag? go-swagger?). This drives the swagger.MergeInto() integration in Gate 7.

OUTPUT FORMAT: Structured report with file:line references for every point above. DO NOT write code. Analysis only.

From the analysis, produce the Config Inventory Table:

Env Var Name Config Field Current Default Go Type Proposed Key Proposed Tier ApplyBehavior Component Secret Validator Group
POSTGRES_HOST Config.Postgres.Host localhost string postgres.primary_host Runtime BundleRebuild postgres No postgres
RATE_LIMIT_MAX Config.RateLimit.Max 100 int rate_limit.max Live-Read LiveRead _none No validatePositiveInt rate_limit
SERVER_ADDRESS Config.Server.Address :8080 string server.address Bootstrap BootstrapOnly _none No server
... ... ... ... ... ... ... ... ... ... ...

This inventory becomes the CONTEXT for all subsequent gates. Gate 2 uses it to create key definitions. Gate 3 uses it to build the Bundle. Gate 6 uses it for snapshot→Config hydration.

<block_condition> HARD GATE: MUST complete the analysis report and config inventory before proceeding. All subsequent gates depend on this inventory. </block_condition>


Gate 1.5: Implementation Preview (Visual Report)

Always executes. This gate generates a visual HTML report showing exactly what will change before any code is written.

Uses the ring:visualize skill to produce a self-contained HTML page.

The report is built from Gate 0 (stack detection) and Gate 1 (codebase analysis). It shows the developer a complete preview of every change that will be made across all subsequent gates.

Orchestrator generates the report using ring:visualize with this content:

The HTML page MUST include these sections:

1. Current Architecture (Before)

  • Mermaid diagram showing current config flow: env vars → Config struct → static injection into services
  • Table of all files that will be created or modified, with purpose
  • How infrastructure clients get their config today (static fields, constructor args)

2. Target Architecture (After)

  • Mermaid diagram showing the three-tier architecture:
    Bootstrap env → Supervisor → Snapshot → BundleFactory → Bundle
                                         → Live-Read (per-request)
                                         → StateSync → Config struct (backward compat)
    
  • Component map showing which infrastructure is managed by systemplane
  • Change feed flow: DB change → pg_notify/change_stream → DebouncedFeed → Supervisor.Reload

3. Key Inventory Table

ALL keys with: name, tier (bootstrap/runtime/live-read), scope, ApplyBehavior, component, secret, validator, group. One row per key. This is the FULL inventory from Gate 1.

4. File Creation Plan

Every file to be created with gate assignment:

Gate File Purpose Lines (est.)
2 systemplane_keys.go Key orchestrator: Register + KeyDefs ~50
2 systemplane_keys_{group}.go Per-group key definitions ~30 each
2 systemplane_keys_validation.go Validator functions ~60
2 systemplane_keys_helpers.go concatKeyDefs utility ~15
3 systemplane_bundle.go Bundle struct + Close + AdoptResources ~80
3 systemplane_factory.go IncrementalBundleFactory ~120
3 systemplane_factory_infra.go Per-component builders ~100
4 systemplane_reconciler_*.go Reconcilers (if applicable) ~50 each
5 systemplane_identity.go JWT → Actor bridge ~30
5 systemplane_authorizer.go Permission mapping ~50
6 config_manager_systemplane.go Snapshot → Config hydration ~150
6 config_manager_seed.go Env → Store seed ~80
6 config_manager_helpers.go Type-safe comparison ~40
6 config_validation.go Production config guards ~60
7 systemplane_init.go 11-step init sequence ~120
7 systemplane_mount.go HTTP route registration + swagger merge ~40
7 active_bundle_state.go Thread-safe live-read accessor ~40

5. ApplyBehavior Distribution

Visual breakdown showing how many keys per tier:

ApplyBehavior Count Percentage Example Keys
BootstrapOnly N N% server.address, auth.enabled
LiveRead N N% rate_limit.max, timeout_sec
WorkerReconcile N N% worker.interval_sec
BundleRebuild N N% postgres.host, redis.host
BundleRebuildAndReconcile N N% worker.enabled

6. Component Map

Component Infrastructure Keys Rebuild Trigger
postgres PostgreSQL client N Connection string, pool size, credentials
redis Redis client N Host, port, password
rabbitmq RabbitMQ connection N URI, host, credentials
s3 Object storage N Endpoint, bucket, credentials
http HTTP policy N Body limit, CORS
logger Logger N Log level
_none No infrastructure N Business logic only (live-read)

7. Swagger Integration

Shows how systemplane routes will appear in the app's Swagger UI:

import systemswagger "github.com/LerianStudio/lib-commons/v4/commons/systemplane/swagger"

// After swag init generates the base spec:
merged, err := systemswagger.MergeInto(baseSwaggerSpec)
// Use merged spec for swagger UI handler

All 9 /v1/system/* routes will be visible in the Swagger UI after merge.

8. Risk Assessment

Risk Mitigation Verification
Config struct regression StateSync hydrates Config from Snapshot on every change Existing tests pass with systemplane active
Startup failure Graceful degradation — service starts without systemplane if backend unavailable Manual test with backend down
Resource leaks on hot-reload AdoptResourcesFrom + ownership tracking in Bundle Bundle factory tests verify resource reuse
Secret exposure RedactPolicy on secret keys, AES-256-GCM encryption in store Security review at Gate 8

9. Retro Compatibility Guarantee

When systemplane store is unavailable, the service degrades to bootstrap defaults:

  • Config values from env vars still work (via defaultConfig() derived from KeyDefs)
  • No runtime mutation API available (9 endpoints return 503)
  • No hot-reload capability (static config for process lifetime)
  • Workers run with static config from env vars
  • The service never fails to start due to systemplane issues

Output: Save the HTML report to docs/systemplane-preview.html in the project root.

Open in browser for the developer to review.

<block_condition> HARD GATE: Developer MUST explicitly approve the implementation preview before any code changes begin (Gates 2+). This prevents wasted effort on incorrect key classification or architectural decisions. </block_condition>

If the developer requests changes to the preview, regenerate the report and re-confirm.


Gate 2: Key Definitions + Registry

Always executes. This gate creates the key definition files that form the backbone of systemplane.

Dispatch ring:backend-engineer-golang with context from Gate 1 config inventory:

TASK: Create systemplane key definition files based on the approved config inventory.

CONTEXT FROM GATE 1: {Full config inventory table from analysis report} APPROVED KEY COUNT: {N keys from Gate 1.5 preview}

IMPORT PATHS (use exactly these):

"github.com/LerianStudio/lib-commons/v4/commons/systemplane/domain"
"github.com/LerianStudio/lib-commons/v4/commons/systemplane/registry"

FILES TO CREATE:

  1. systemplane_keys.go — Orchestrator:

    func Register{Service}Keys(reg registry.Registry) error {
        for _, def := range {service}KeyDefs() {
            if err := reg.Register(def); err != nil {
                return fmt.Errorf("register key %q: %w", def.Key, err)
            }
        }
        return nil
    }
    
    func {service}KeyDefs() []domain.KeyDef {
        return concatKeyDefs(
            {service}KeyDefsAppServer(),
            {service}KeyDefsPostgres(),
            // ... one per group from inventory
        )
    }
    
  2. systemplane_keys_{group}.go — Per-group key definitions. Split by group names from inventory (app, server, postgres, redis, rabbitmq, auth, swagger, telemetry, rate_limit, worker, etc.)

  3. systemplane_keys_validation.go — Validator functions:

    • validatePositiveInt(value any) error
    • validateLogLevel(value any) error
    • validateSSLMode(value any) error
    • validateAbsoluteHTTPURL(value any) error
    • validatePort(value any) error
    • Any service-specific validators
  4. systemplane_keys_helpers.goconcatKeyDefs() utility

KEY DEFINITION RULES:

Every key MUST have ALL of these fields:

  • Key — dotted name (e.g., "postgres.primary_host")
  • Kinddomain.KindConfig or domain.KindSetting
  • AllowedScopes[]domain.Scope{domain.ScopeGlobal} or both
  • DefaultValue — THE single source of truth for defaults
  • ValueTypedomain.ValueString, domain.ValueInt, domain.ValueBool, domain.ValueFloat
  • ApplyBehavior — classify using the 5-level taxonomy (see Appendix F)
  • MutableAtRuntimefalse for bootstrap-only, true for everything else
  • Secrettrue if contains credentials, tokens, keys
  • RedactPolicydomain.RedactFull for secrets, domain.RedactNone otherwise
  • Description — human-readable description
  • Group — logical grouping matching file suffix
  • Component — infrastructure component: "postgres", "redis", "rabbitmq", "s3", "http", "logger", or domain.ComponentNone ("_none")
  • Validator — custom validation function (nil if none needed)

DefaultValue is the SINGLE SOURCE OF TRUTH. No separate config defaults. The defaultConfig() function will derive from KeyDefs via configFromSnapshot(defaultSnapshotFromKeyDefs(...)).

Apply Behavior classification — Include the 5-level taxonomy:

ApplyBehavior Constant Strength Runtime Effect Use When
bootstrap-only domain.ApplyBootstrapOnly 4 Immutable after startup Server address, TLS, auth, telemetry
bundle-rebuild+worker-reconcile domain.ApplyBundleRebuildAndReconcile 3 Bundle swap AND worker restart Worker enable/disable (needs new connections + restart)
bundle-rebuild domain.ApplyBundleRebuild 2 Rebuild infra clients only Connection strings, pool sizes, credentials
worker-reconcile domain.ApplyWorkerReconcile 1 Restart workers only Worker intervals, scheduler periods
live-read domain.ApplyLiveRead 0 Zero-cost per-request reads Rate limits, timeouts, TTLs, feature flags

Completion criteria: All keys registered, go build ./... passes, key count matches inventory from Gate 1.5.

Catalog validation (MANDATORY): Products MUST add a test that validates their KeyDefs against the lib-commons canonical catalog:

func TestKeyDefs_MatchCanonicalCatalog(t *testing.T) {
    reg := registry.New()
    Register{Service}Keys(reg)
    
    var allDefs []domain.KeyDef
    for _, def := range reg.List(domain.KindConfig) {
        allDefs = append(allDefs, def)
    }
    for _, def := range reg.List(domain.KindSetting) {
        allDefs = append(allDefs, def)
    }
    
    mismatches := catalog.ValidateKeyDefs(allDefs, catalog.AllSharedKeys()...)
    for _, m := range mismatches {
        t.Errorf("catalog mismatch: %s", m)
    }
}

Any mismatch = NON-COMPLIANT. The canonical catalog in lib-commons/commons/systemplane/catalog/ defines the source of truth for shared key names, tiers, and components.

KindSetting vs KindConfig decision tree:

Is this a per-tenant business rule?
  YES → KindSetting (AllowedScopes: [ScopeGlobal, ScopeTenant])
  NO ↓

Is this a product-wide feature flag or business parameter?
  YES → KindSetting (AllowedScopes: [ScopeGlobal])
  NO ↓

Is this an infrastructure knob (DB, Redis, auth, telemetry)?
  YES → KindConfig
  NO ↓

Default: KindConfig

Products MUST NOT leave the Settings API empty. If no per-tenant settings are identified, document the justification in systemplane-guide.md.

Verification: grep "Register.*Keys\|KeyDefs()" internal/bootstrap/ + go build ./...


Gate 3: Bundle + BundleFactory

Always executes. This gate creates the runtime resource container and its incremental builder.

Dispatch ring:backend-engineer-golang with context from Gates 1-2:

TASK: Create the Bundle, BundleFactory, and per-component infrastructure builders.

DETECTED INFRASTRUCTURE: {postgres, redis, rabbitmq, s3 — from Gate 0} KEY DEFINITIONS: {key groups and components from Gate 2}

IMPORT PATHS (use exactly these):

"github.com/LerianStudio/lib-commons/v4/commons/systemplane/domain"
"github.com/LerianStudio/lib-commons/v4/commons/systemplane/ports"

FILES TO CREATE:

  1. systemplane_bundle.go — Bundle struct with:

    • Per-component ownership fields (postgres, redis, rabbitmq, mongo, s3, http, logger)
    • Close(ctx context.Context) error method that closes all owned resources in REVERSE dependency order
    • AdoptResourcesFrom(previous domain.RuntimeBundle) method that nil-outs transferred pointers in previous
    • Ownership booleans per component (e.g., ownsPostgres, ownsRedis)

    Pattern:

    type {Service}Bundle struct {
        Infra  *InfraBundle
        HTTP   *HTTPPolicyBundle
        Logger *LoggerBundle
        ownsPostgres, ownsRedis, ownsRabbitMQ, ownsObjectStorage bool
    }
    
    func (b *{Service}Bundle) Close(ctx context.Context) error {
        // Close in REVERSE dependency order, only owned resources
        if b.ownsObjectStorage && b.Infra.ObjectStorage != nil { b.Infra.ObjectStorage.Close() }
        if b.ownsRabbitMQ && b.Infra.RabbitMQ != nil { b.Infra.RabbitMQ.Close() }
        if b.ownsRedis && b.Infra.Redis != nil { b.Infra.Redis.Close() }
        if b.ownsPostgres && b.Infra.Postgres != nil { b.Infra.Postgres.Close() }
        return nil
    }
    
    func (b *{Service}Bundle) AdoptResourcesFrom(previous domain.RuntimeBundle) {
        prev, ok := previous.(*{Service}Bundle)
        if !ok || prev == nil { return }
        if !b.ownsPostgres { prev.Infra.Postgres = nil }
        if !b.ownsRedis { prev.Infra.Redis = nil }
        // ... etc
    }
    
  2. systemplane_factory.go — IncrementalBundleFactory with:

    • Build(ctx, snapshot) — full rebuild, creates everything from scratch
    • BuildIncremental(ctx, snapshot, previous, prevSnapshot) — component-granular rebuild
    • keyComponentMap — built once at factory construction from KeyDefs
    • diffChangedComponents(snap, prevSnap) — uses keyComponentMap to find which components changed

    Pattern:

    type {Service}BundleFactory struct {
        bootstrapCfg   *BootstrapOnlyConfig
        keyComponentMap map[string]string // key → component
    }
    
    func (f *{Service}BundleFactory) Build(ctx context.Context, snap domain.Snapshot) (domain.RuntimeBundle, error) {
        // Full rebuild — creates everything from scratch
    }
    
    func (f *{Service}BundleFactory) BuildIncremental(ctx context.Context, snap domain.Snapshot,
        previous domain.RuntimeBundle, prevSnap domain.Snapshot) (domain.RuntimeBundle, error) {
        // Diff changed components, rebuild only what changed, reuse the rest
    }
    
  3. systemplane_factory_infra.go — Per-component builders:

    • buildPostgres(ctx, snapshot) — creates postgres client from snapshot keys
    • buildRedis(ctx, snapshot) — creates redis client from snapshot keys
    • buildRabbitMQ(ctx, snapshot) — creates rabbitmq connection from snapshot keys
    • buildObjectStorage(ctx, snapshot) — creates S3 client from snapshot keys
    • buildLogger(snapshot) — creates logger from snapshot keys
    • buildHTTPPolicy(snapshot) — extracts HTTP policy from snapshot keys (one per detected infrastructure — omit unused components)

Completion criteria: Bundle builds successfully from snapshot, incremental rebuild reuses unchanged components, go build ./... passes.

Verification: grep "IncrementalBundleFactory\|BundleFactory\|AdoptResourcesFrom" internal/bootstrap/ + go build ./...


Gate 4: Reconcilers (Conditional)

SKIP IF: no background workers AND no RabbitMQ AND no HTTP policy changes detected in Gate 0.

When to include each reconciler:

  • HTTPPolicyReconciler — if HTTP policy keys exist (body limit, CORS, timeouts)
  • PublisherReconciler — if RabbitMQ detected
  • WorkerReconciler — if background workers detected

Dispatch ring:backend-engineer-golang with context from Gates 1-3:

TASK: Create reconcilers for side effects on config changes.

DETECTED: {workers: Y/N, rabbitmq: Y/N, http_policy_keys: Y/N from Gates 0-2}

IMPORT PATHS (use exactly these):

"github.com/LerianStudio/lib-commons/v4/commons/systemplane/domain"
"github.com/LerianStudio/lib-commons/v4/commons/systemplane/ports"

FILES TO CREATE (only for detected components):

  1. systemplane_reconciler_http.go — PhaseValidation reconciler for HTTP config:

    • Validates body limit is non-negative
    • Validates CORS origins are present
    • Can REJECT changes (returns error to abort the reload)
    func (r *HTTPPolicyReconciler) Phase() domain.ReconcilerPhase { return domain.PhaseValidation }
    
  2. systemplane_reconciler_publishers.go — PhaseValidation for RabbitMQ publisher staging (if RMQ detected):

    • When RabbitMQ connection changed, creates staged publishers on candidate bundle
    • The observer callback swaps them in later
    func (r *PublisherReconciler) Phase() domain.ReconcilerPhase { return domain.PhaseValidation }
    
  3. systemplane_reconciler_worker.go — PhaseSideEffect for worker restart (if workers detected):

    • Reads worker config from snapshot
    • Calls workerManager.ApplyConfig(cfg) to restart affected workers
    • Runs LAST because it has external side effects
    func (r *WorkerReconciler) Phase() domain.ReconcilerPhase { return domain.PhaseSideEffect }
    

Phase ordering is enforced by the type system — you cannot register a reconciler without declaring its phase. The supervisor stable-sorts by phase, so reconcilers within the same phase retain their registration order.

Phase execution order: PhaseStateSync (0) → PhaseValidation (1) → PhaseSideEffect (2)

Completion criteria: Reconcilers compile, implement ports.BundleReconciler interface, phase ordering is correct. go build ./... passes.

Verification: grep "BundleReconciler\|Reconcile\|Phase()" internal/bootstrap/systemplane_reconciler_*.go + go build ./...


Gate 5: Identity + Authorization

Always executes. Systemplane HTTP endpoints require identity resolution and permission checking.

Dispatch ring:backend-engineer-golang with context from Gate 1 analysis:

TASK: Implement IdentityResolver and Authorizer for systemplane HTTP endpoints.

CONTEXT FROM GATE 1: {Authentication/authorization analysis — how JWT is parsed, where user ID/tenant ID are extracted}

IMPORT PATHS (use exactly these):

"github.com/LerianStudio/lib-commons/v4/commons/systemplane/domain"
"github.com/LerianStudio/lib-commons/v4/commons/systemplane/ports"

FILES TO CREATE:

  1. systemplane_identity.go — Implements ports.IdentityResolver:

    type {Service}IdentityResolver struct{}
    
    func (r *{Service}IdentityResolver) Actor(ctx context.Context) (domain.Actor, error) {
        uid := auth.GetUserID(ctx) // use the service's existing auth context extraction
        if uid == "" { uid = "anonymous" }
        return domain.Actor{ID: uid}, nil
    }
    
    func (r *{Service}IdentityResolver) TenantID(ctx context.Context) (string, error) {
        return auth.GetTenantID(ctx) // use the service's existing tenant extraction
    }
    
  2. systemplane_authorizer.go — Implements ports.Authorizer: Maps 9 permissions to the service's auth model:

    • system/configs:read — view config values
    • system/configs:write — update config values
    • system/configs/schema:read — view key definitions
    • system/configs/history:read — view config change history
    • system/configs/reload:write — force full reload
    • system/settings:read — view settings
    • system/settings:write — update settings
    • system/settings/schema:read — view setting key definitions
    • system/settings:history:read — view settings history (scope-dependent)
    type {Service}Authorizer struct {
        authEnabled bool
    }
    
    func (a *{Service}Authorizer) Authorize(ctx context.Context, permission string) error {
        if !a.authEnabled { return nil } // dev mode bypass
        // Map permission to the service's RBAC action, call auth.Authorize()
    }
    

When auth is disabled (dev mode), authorizer returns nil for all permissions. When auth is enabled, map systemplane permissions to the service's existing permission model.

Authorizer enforcement policy (MANDATORY):

The authorizer MUST perform real permission checking. Two approved patterns:

  1. Granular delegation (recommended): Use ports.DelegatingAuthorizer from lib-commons. Splits permission string and delegates to external auth service.
  2. Admin-only with justification: Use a binary admin check BUT document the justification in systemplane-guide.md and add a code comment explaining why granular is not used.

PROHIBITED patterns:

  • No-op authorizer (always returns nil) = automatic Gate 5 FAILURE
  • Authorizer that ignores the permission string = NON-COMPLIANT
  • Authorizer with no auth-enabled check (must handle disabled auth)

Default implementations from lib-commons:

  • ports.AllowAllAuthorizer — for auth-disabled mode
  • ports.DelegatingAuthorizer — for auth-enabled mode with per-permission delegation
  • ports.FuncIdentityResolver — adapts context extraction functions to IdentityResolver

Completion criteria: Both interfaces implemented, go build ./... passes.

Verification: grep "IdentityResolver\|Authorizer\|Actor\|TenantID\|Authorize" internal/bootstrap/systemplane_identity.go internal/bootstrap/systemplane_authorizer.go + go build ./...


Gate 6: Config Manager Bridge

Always executes. This gate ensures backward compatibility — existing code reading the Config struct continues to work.

Dispatch ring:backend-engineer-golang with context from Gates 1-2:

TASK: Create the Config Manager bridge that hydrates the existing Config struct from systemplane Snapshot.

CONTEXT FROM GATE 1: {Config struct location, all fields, current defaults} KEY DEFINITIONS: {All keys from Gate 2 with their snapshot key names}

FILES TO CREATE:

  1. config_manager_systemplane.go — StateSync callback + snapshot hydration:

    Two core functions:

    • configFromSnapshot(snap domain.Snapshot) *Config — builds a Config entirely from snapshot values. ALL fields come from the snapshot — no bootstrap overlay. This is used by defaultConfig().
    • snapshotToFullConfig(snap domain.Snapshot, oldCfg *Config) *Config — hydrates from snapshot, then overlays bootstrap-only fields from the previous config (they never change at runtime).

    Helper functions for type-safe extraction with JSON coercion:

    • snapString(snap, key, fallback) string
    • snapInt(snap, key, fallback) int
    • snapBool(snap, key, fallback) bool
    • snapFloat64(snap, key, fallback) float64

    The StateSync callback (registered on Manager):

    StateSync: func(_ context.Context, snap domain.Snapshot) {
        newCfg := snapshotToFullConfig(snap, baseCfg)
        configManager.swapConfig(newCfg)
    },
    
  2. config_manager_seed.go — One-time env → store seed:

    • Reads current env vars and seeds them into systemplane store
    • Only runs on first boot (when store is empty)
    • Preserves existing configuration values during migration
    • Skips bootstrap-only keys (they don't go into the store)
    • Uses store.Put() with domain.RevisionZero for initial write
  3. config_manager_helpers.go — Type-safe value comparison:

    • valuesEquivalent(a, b any) bool — handles JSON coercion (float64 vs int)
    • Used by seed logic to avoid overwriting store values that match env defaults
  4. config_validation.go — Production config guards:

    • Validates critical config combinations at startup
    • Used as ConfigWriteValidator on the Manager
    • Prevents invalid states (e.g., TLS enabled without cert path)
    • Returns non-nil error to REJECT a write before persistence

CRITICAL: The defaultConfig() function MUST be updated to derive from KeyDefs:

func defaultConfig() *Config {
    return configFromSnapshot(defaultSnapshotFromKeyDefs({service}KeyDefs()))
}

This eliminates drift between defaults and key definitions. If you change a default in {service}KeyDefs(), the Config struct picks it up automatically.

Completion criteria: Config struct hydrates correctly from snapshot, existing code reading configManager.Get() gets updated values after StateSync, go build ./... passes.

Verification: grep "configFromSnapshot\|snapshotToFullConfig\|StateSync\|seedStore" internal/bootstrap/ + go build ./...


Gate 7: Wiring + HTTP Mount + Swagger + ChangeFeed CRITICAL

<cannot_skip>

This gate is NEVER skippable. It is the most critical gate.

This is where the systemplane comes alive. Without Gate 7, keys are metadata, bundles are dead code, and reconcilers never fire. Gate 7 wires everything together: the 11-step init sequence, HTTP endpoint mount, Swagger merge, ChangeFeed subscription, active bundle state, and shutdown integration.

HARD GATE: Gate 7 MUST be fully implemented. There is no "partial" Gate 7. Every subcomponent below is required.

</cannot_skip>

Dispatch ring:backend-engineer-golang with context from ALL previous gates:

TASK: Wire the complete systemplane lifecycle — init, HTTP mount, swagger merge, change feed, bundle state, and shutdown.

ALL PREVIOUS GATES: {Key definitions from Gate 2, Bundle/Factory from Gate 3, Reconcilers from Gate 4, Identity/Auth from Gate 5, Config Bridge from Gate 6}

IMPORT PATHS (use exactly these):

"github.com/LerianStudio/lib-commons/v4/commons/systemplane/domain"
"github.com/LerianStudio/lib-commons/v4/commons/systemplane/ports"
"github.com/LerianStudio/lib-commons/v4/commons/systemplane/registry"
"github.com/LerianStudio/lib-commons/v4/commons/systemplane/service"
"github.com/LerianStudio/lib-commons/v4/commons/systemplane/bootstrap"
"github.com/LerianStudio/lib-commons/v4/commons/systemplane/bootstrap/builtin"
"github.com/LerianStudio/lib-commons/v4/commons/systemplane/adapters/changefeed"
fiberhttp "github.com/LerianStudio/lib-commons/v4/commons/systemplane/adapters/http/fiber"
systemswagger "github.com/LerianStudio/lib-commons/v4/commons/systemplane/swagger"

FILES TO CREATE:

1. systemplane_init.go — The 11-step initialization sequence

func Init{Service}Systemplane(ctx context.Context, cfg *Config, configManager *ConfigManager,
    workerManager *WorkerManager, logger log.Logger,
    observer func(service.ReloadEvent)) (*SystemplaneComponents, error) {

    // Step 1: Extract bootstrap-only config (immutable for process lifetime)
    bootstrapCfg := ExtractBootstrapOnlyConfig(cfg)

    // Step 2: Load backend config (default: reuse app's Postgres DSN)
    backendCfg := Load{Service}BackendConfig(cfg)

    // Step 3: Create registry + register ALL keys
    reg := registry.New()
    if err := Register{Service}Keys(reg); err != nil {
        return nil, err
    }

    // Step 4: Configure backend with registry metadata (secret keys + apply behaviors)
    configureBackendWithRegistry(backendCfg, reg)

    // Step 5: Create backend (Store + History + ChangeFeed)
    backend, err := builtin.NewBackendFromConfig(ctx, backendCfg)
    if err != nil { return nil, fmt.Errorf("systemplane backend: %w", err) }

    // Step 6: Create snapshot builder
    snapBuilder, err := service.NewSnapshotBuilder(reg, backend.Store)
    if err != nil { backend.Close(); return nil, err }

    // Step 7: Create bundle factory (supports incremental builds)
    bundleFactory := New{Service}BundleFactory(&bootstrapCfg)

    // Step 8: Seed store from current env-var config (first boot only)
    if err := seedStoreForInitialReload(ctx, configManager, reg, backend.Store); err != nil {
        backend.Close(); return nil, err
    }

    // Step 9: Build reconcilers (phase-sorted by supervisor)
    reconcilers := buildReconcilers(workerManager)

    // Step 10: Create supervisor + initial reload
    supervisor, err := service.NewSupervisor(service.SupervisorConfig{
        Builder:     snapBuilder,
        Factory:     bundleFactory,
        Reconcilers: reconcilers,
        Observer:    observer,
    })
    if err != nil { backend.Close(); return nil, err }

    if err := supervisor.Reload(ctx, "initial-bootstrap"); err != nil {
        backend.Close(); return nil, err
    }

    // Step 11: Create manager with callbacks
    baseCfg := configManager.Get()
    manager, err := service.NewManager(service.ManagerConfig{
        Registry:   reg,
        Store:      backend.Store,
        History:    backend.History,
        Supervisor: supervisor,
        Builder:    snapBuilder,
        ConfigWriteValidator: productionConfigGuards(baseCfg),
        StateSync: func(_ context.Context, snap domain.Snapshot) {
            newCfg := snapshotToFullConfig(snap, baseCfg)
            configManager.swapConfig(newCfg)
        },
    })
    if err != nil { backend.Close(); return nil, err }

    return &SystemplaneComponents{
        ChangeFeed: backend.ChangeFeed,
        Supervisor: supervisor,
        Manager:    manager,
        Backend:    backend.Closer,
    }, nil
}

2. systemplane_mount.go — HTTP route registration

func MountSystemplaneAPI(app *fiber.App, manager service.Manager,
    authEnabled bool) error {

    authorizer := &{Service}Authorizer{authEnabled: authEnabled}
    identity := &{Service}IdentityResolver{}

    handler, err := fiberhttp.NewHandler(manager, identity, authorizer)
    if err != nil {
        return fmt.Errorf("create systemplane handler: %w", err)
    }

    handler.Mount(app) // registers all 9 /v1/system/* routes
    return nil
}

This registers ALL 9 endpoints:

  • GET /v1/system/configs
  • PATCH /v1/system/configs
  • GET /v1/system/configs/schema
  • GET /v1/system/configs/history
  • POST /v1/system/configs/reload
  • GET /v1/system/settings
  • PATCH /v1/system/settings
  • GET /v1/system/settings/schema
  • GET /v1/system/settings/history

3. Swagger Integration

In the app's swagger setup (wherever the swagger middleware is configured):

import systemswagger "github.com/LerianStudio/lib-commons/v4/commons/systemplane/swagger"

// After swag init generates the base spec:
merged, err := systemswagger.MergeInto(baseSwaggerSpec)
if err != nil {
    return fmt.Errorf("merge systemplane swagger: %w", err)
}
// Use merged spec for swagger UI handler

4. active_bundle_state.go — Thread-safe live-read accessor

type activeBundleState struct {
    mu     sync.RWMutex
    bundle *{Service}Bundle
}

func (s *activeBundleState) Current() *{Service}Bundle {
    s.mu.RLock()
    defer s.mu.RUnlock()
    return s.bundle
}

func (s *activeBundleState) Update(b *{Service}Bundle) {
    s.mu.Lock()
    defer s.mu.Unlock()
    s.bundle = b
}

Used by infrastructure consumers for live-read access to the current bundle.

5. ChangeFeed integration (in init.go wiring)

debouncedFeed := changefeed.NewDebouncedFeed(
    spComponents.ChangeFeed,
    changefeed.WithWindow(200 * time.Millisecond),
)

feedCtx, cancelFeed := context.WithCancel(ctx)
go func() {
    _ = debouncedFeed.Subscribe(feedCtx, func(signal ports.ChangeSignal) {
        _ = spComponents.Manager.ApplyChangeSignal(feedCtx, signal)
    })
}()

6. Shutdown sequence integration

func (s *Service) Stop(ctx context.Context) {
    s.configManager.Stop()              // 1. Prevent mutations
    s.cancelChangeFeed()                // 2. Stop change feed BEFORE supervisor
    s.spComponents.Supervisor.Stop(ctx) // 3. Stop supervisor + close bundle
    s.spComponents.Backend.Close()      // 4. Close store connection
    s.workerManager.Stop()              // 5. Stop workers
}

7. Observer callback (passed to InitSystemplane)

runtimeReloadObserver := func(event service.ReloadEvent) {
    bundle := event.Bundle.(*{Service}Bundle)
    bundleState.Update(bundle)
    configManager.UpdateFromSystemplane(event.Snapshot)
    // Swap logger, publishers, etc.
}

8. Bootstrap config file — config/.config-map.example

Lists all bootstrap-only keys with their env var names. Operator reference for deployment configuration.

# {Service} — Bootstrap-Only Configuration (requires restart)
#
# These are the ONLY settings that require a container/pod restart.
# Everything else is hot-reloadable via:
#
#   GET  /v1/system/configs          — view current runtime config
#   PATCH /v1/system/configs         — change any runtime-managed key
#   GET  /v1/system/configs/schema   — see all keys, types, and mutability
#   GET  /v1/system/configs/history  — audit trail of changes

ENV_NAME=development
SERVER_ADDRESS=:8080
AUTH_ENABLED=false
ENABLE_TELEMETRY=false
# ... all bootstrap-only keys

Completion criteria:

  • All 9 HTTP routes accessible
  • ChangeFeed fires on store changes
  • Supervisor rebuilds bundle on config change
  • Swagger UI shows systemplane routes (via MergeInto)
  • Graceful shutdown closes all resources in correct order
  • Active bundle state provides live-read access to request handlers
  • Observer callback updates bundle state and config manager on reload

Verification:

grep "handler.Mount\|fiberhttp.NewHandler" internal/bootstrap/
grep "DebouncedFeed\|Subscribe\|ApplyChangeSignal" internal/bootstrap/
grep "swagger.MergeInto\|systemswagger" internal/bootstrap/ internal/
grep "activeBundleState\|bundleState.Update\|bundleState.Current" internal/bootstrap/
grep "cancelChangeFeed\|Supervisor.Stop\|Backend.Close" internal/bootstrap/
# Verify active bundle state exists for thread-safe live-read
grep -rn "activeBundleState\|bundleState" internal/bootstrap/ --include='*.go'
go build ./...

Gate 7 Anti-Rationalization Table

Rationalization WRONG BECAUSE REQUIRED ACTION
"Mount is straightforward, I'll skip it" Without Mount, 9 endpoints are unreachable — the entire management API is dead MUST implement handler.Mount(app)
"Swagger is optional documentation" Without swagger merge, operators cannot discover systemplane endpoints in Swagger UI MUST call systemswagger.MergeInto()
"ChangeFeed can be added later" Without ChangeFeed, hot-reload is broken — config changes in DB are invisible to the service MUST start DebouncedFeed
"Shutdown is handled by the framework" Supervisor, ChangeFeed, and Bundle hold resources that leak without explicit shutdown MUST integrate 5-step shutdown sequence
"Active bundle state is internal detail" Without thread-safe accessor, request handlers cannot read live config — race conditions MUST implement active_bundle_state.go
"Init function is just boilerplate" The 11-step sequence has strict ordering — wrong order causes nil panics or stale data MUST follow exact step order
"Observer callback is optional" Without observer, bundle state and config manager are never updated after reload MUST register observer on Supervisor
"Config map example is nice-to-have" Operators need to know which keys require restart vs hot-reload — without it, they restart for everything MUST create config/.config-map.example

Gate 8: Code Review

Dispatch 10 parallel reviewers (same pattern as ring:codereview).

MUST include this context in ALL 10 reviewer dispatches:

SYSTEMPLANE REVIEW CONTEXT:

  • Systemplane is a three-tier configuration system: bootstrap (env, immutable) → runtime (DB, hot-reload) → live-read (snapshot, per-request).
  • Every key has an ApplyBehavior that determines how changes propagate: BootstrapOnly → BundleRebuildAndReconcile → BundleRebuild → WorkerReconcile → LiveRead.
  • IncrementalBundleFactory rebuilds only changed components. Ownership tracking prevents double-free.
  • StateSync callback keeps the Config struct in sync for backward compatibility.
  • The 9 HTTP endpoints on /v1/system/* expose config management with If-Match/ETag concurrency.
  • ChangeFeed (pg_notify or change_stream) triggers automatic reload on database changes.
  • swagger.MergeInto() adds systemplane routes to the app's Swagger spec.
Reviewer Systemplane-Specific Focus
ring:code-reviewer Architecture, lib-commons v4 systemplane usage, three-tier separation, package boundaries, import paths
ring:business-logic-reviewer Key classification correctness, ApplyBehavior assignments, component granularity, default values match current behavior
ring:security-reviewer Secret key redaction (RedactFull/RedactMask), authorization model, no credential leaks in config API responses, AES-256-GCM encryption
ring:test-reviewer Key registration tests, bundle factory tests, reconciler tests, contract tests, config hydration tests
ring:nil-safety-reviewer Nil risks in snapshot reads, bundle adoption, reconciler error paths, active bundle state before first reload
ring:consequences-reviewer Impact on existing config reads, backward compat via StateSync, degradation when store unavailable, shutdown resource cleanup
ring:dead-code-reviewer Orphaned env-reading code, dead config helpers replaced by systemplane, unused YAML/viper imports, stale .env files
ring:performance-reviewer Hot-path allocations in config reads, caching efficiency, systemplane polling overhead, connection pool sizing
ring:multi-tenant-reviewer Systemplane key scoping across tenants, tenant isolation in config snapshots, cross-tenant config leakage prevention
ring:lib-commons-reviewer lib-commons v4 systemplane package adoption, canonical patterns vs custom reimplementations, shared helper reuse

MANDATORY: All 10 reviewers must PASS. 9/10 = FAIL. Critical findings → fix and re-review.


Gate 9: User Validation

MUST approve: present checklist for explicit user approval.

## Systemplane Migration Complete

- [ ] All keys registered and classified (count matches preview: {N} keys)
- [ ] BundleFactory builds successfully with IncrementalBundleFactory
- [ ] Incremental rebuild reuses unchanged components (ownership tracking works)
- [ ] Reconcilers fire on config changes (if applicable)
- [ ] Identity + Authorization wired (JWT → Actor, 9 permissions mapped)
- [ ] Config Manager bridge works (StateSync hydrates Config struct from Snapshot)
- [ ] `defaultConfig()` derives from KeyDefs (single source of truth)
- [ ] HTTP Mount: all 9 /v1/system/* endpoints accessible
- [ ] Swagger: systemplane routes visible in Swagger UI (via MergeInto)
- [ ] ChangeFeed: config changes in DB trigger supervisor reload
- [ ] Active bundle state: thread-safe live-read for request handlers
- [ ] Shutdown: clean 5-step resource release on SIGTERM
- [ ] Backward compat: service starts normally without systemplane store (degrades to defaults)
- [ ] Tests pass: `go test ./...`
- [ ] Code review passed: all 10 reviewers PASS

Gate 10: Activation Guide

MUST generate docs/systemplane-guide.md in the project root. Direct, concise, no filler text.

The guide is built from Gate 0 (stack detection), Gate 1 (analysis), and Gate 2 (key inventory).

The guide MUST include:

1. Components Table

Component Purpose Status
Registry Key definitions (types, defaults, validators) {N} keys registered
Supervisor Reload lifecycle (snapshot → bundle → reconcile → swap) Active
Manager HTTP API backend (reads, writes, schema, history, resync) Active
BundleFactory IncrementalBundleFactory with component-granular rebuilds Active
Reconcilers Side-effect appliers (HTTP, Publisher, Worker) {N} reconcilers
ChangeFeed Database change → DebouncedFeed → Supervisor.Reload Active

2. Bootstrap-Only Config Reference

Env Var Default Description Requires Restart
ENV_NAME development Environment name Yes
SERVER_ADDRESS :8080 Listen address Yes
AUTH_ENABLED false Enable auth middleware Yes
... ... ... ...

3. Runtime-Managed Config Reference

Key Default Type ApplyBehavior Hot-Reloadable
rate_limit.max 100 int LiveRead Yes (instant)
postgres.primary_host localhost string BundleRebuild Yes (rebuilds PG client)
... ... ... ... ...

4. Activation Steps

  1. Ensure PostgreSQL (or MongoDB) is running
  2. Set SYSTEMPLANE_SECRET_MASTER_KEY env var (32 bytes, raw or base64)
  3. Start the service — systemplane auto-creates tables/collections
  4. On first boot, env var values are seeded into the store
  5. Verify: curl http://localhost:{port}/v1/system/configs | jq

5. Verification Commands

# View current runtime config
curl -s http://localhost:{port}/v1/system/configs | jq

# View schema (all keys, types, mutability)
curl -s http://localhost:{port}/v1/system/configs/schema | jq

# Change a runtime key
curl -X PATCH http://localhost:{port}/v1/system/configs \
  -H 'Content-Type: application/json' \
  -H 'If-Match: "current-revision"' \
  -d '{"values": {"rate_limit.max": 200}}'

# View change history
curl -s http://localhost:{port}/v1/system/configs/history | jq

# Force full reload
curl -X POST http://localhost:{port}/v1/system/configs/reload

# View settings
curl -s http://localhost:{port}/v1/system/settings?scope=global | jq

6. Degradation Behavior

Scenario Behavior
Systemplane backend unavailable at startup Service starts with env var defaults (no hot-reload)
Systemplane backend goes down after startup Last-known config remains active (no updates until reconnect)
ChangeFeed disconnects Auto-reconnect with exponential backoff; manual POST /v1/system/configs/reload available
Invalid config write ConfigWriteValidator rejects before persistence; no impact on running config
Bundle build failure Previous bundle stays active; error logged; no disruption

7. Common Errors and Solutions

Error Cause Solution
systemplane backend: connection refused PostgreSQL/MongoDB not running Start the database
register key "X": duplicate key Key registered twice Check for duplicate key names in key definition files
revision mismatch (409) Concurrent write conflict Re-read current revision, retry PATCH with updated If-Match
key is not mutable at runtime (400) Tried to PATCH a bootstrap-only key This key requires a restart — update env var instead
secret master key required SYSTEMPLANE_SECRET_MASTER_KEY not set Set the env var (32 bytes)

State Persistence

Save to docs/dev-systemplane-migration/current-cycle.json for resume support:

{
  "skill": "ring:dev-systemplane-migration",
  "version": "2.0.0",
  "service": "{service_name}",
  "started_at": "ISO8601",
  "current_gate": "2",
  "gates_completed": ["0", "1", "1.5"],
  "gates_skipped": [],
  "detection": {
    "backend": "postgres",
    "dependencies": ["postgres", "redis", "rabbitmq"],
    "has_workers": true,
    "existing_systemplane": false,
    "config_pattern": "envDefault",
    "key_count": 45
  },
  "compliance": {
    "status": "NEW",
    "checks": {}
  }
}

Anti-Rationalization Table

Rationalization WRONG BECAUSE REQUIRED ACTION
"Keys are registered, migration is done" Without BundleFactory + Wiring, keys are just metadata — nothing reads them at runtime Complete ALL gates through Gate 7
"We can wire it later" Gate 7 is where the system comes alive — without it, systemplane is dead code Gate 7 is NEVER skippable
"Swagger integration is cosmetic" Operators need API discoverability; without it, 9 endpoints are invisible MUST call systemswagger.MergeInto()
"Config bridge is backward compat only" Existing code reads Config struct — without StateSync, it reads stale bootstrap values forever MUST implement Gate 6
"Reconcilers are optional" Without reconcilers, workers and HTTP policies ignore config changes — hot-reload is partial Evaluate at Gate 4; implement if infra detected
"Service already has systemplane code" Existence ≠ compliance. Code that doesn't follow canonical patterns is WRONG and must be fixed. STOP. Run compliance audit (Gate 0 Phase 2). Fix every NON-COMPLIANT component.
"Partial systemplane is good enough" Partial = broken. Without ChangeFeed, no hot-reload. Without Mount, no API. Without StateSync, stale config. STOP. Every component is required. Execute all gates.
"The current approach works fine" Working ≠ compliant. Custom config management creates drift and blocks standardization. Replace with canonical systemplane implementation.
"Skip code review, we tested it" One wiring mistake = resource leak, config corruption, or silent degradation. 10 reviewers catch what tests miss. MANDATORY: All 10 reviewers must PASS.
"ChangeFeed can be added later" Later = never. Without ChangeFeed, the database is a dead store — changes are invisible. MUST start DebouncedFeed in Gate 7
"Agent says out of scope" Skill defines scope, not agent. Re-dispatch with gate context
"Active bundle state is an implementation detail" Without it, request handlers cannot safely read live config. Race conditions and nil panics. MUST implement in Gate 7
"Key names don't matter, it's just a string" Different names for the same config break cross-product dashboards, Helm overlays, and operator muscle memory. Names are the API. MUST match canonical catalog names
"Our tier classification is correct for our use case" Products don't get to choose tiers independently. PG pool tuning is LiveRead everywhere or nowhere — mixed tiers break operator expectations MUST match canonical tier classification
"Admin-only auth is fine for now" It's been "for now" across 3 products and 2 years. Granular permissions exist in lib-commons but are unused. MUST implement granular or document justification

Appendix A: Package Structure Reference

The systemplane lives in lib-commons/v4/commons/systemplane/ — a self-contained, backend-agnostic library with zero imports of internal application packages. It was extracted from Matcher's pkg/systemplane/ into lib-commons v4.3.0 to be shared across all Lerian services.

Directory Structure

commons/systemplane/                            # in lib-commons repo
├── doc.go                                      # Package doc
├── domain/                                     # Pure value objects, no infra deps
│   ├── actor.go                                # Actor{ID}
│   ├── apply_behavior.go                       # ApplyBehavior enum + Strength()
│   ├── backend_kind.go                         # BackendKind enum (postgres, mongodb)
│   ├── bundle.go                               # RuntimeBundle interface
│   ├── entry.go                                # Entry (persisted override record)
│   ├── errors.go                               # 14 sentinel errors
│   ├── key_def.go                              # KeyDef, ValueType, RedactPolicy, ValidatorFunc, ComponentNone
│   ├── kind.go                                 # Kind enum (config, setting)
│   ├── nil_value.go                            # IsNilValue() — reflect-based typed-nil detection
│   ├── reconciler_phase.go                     # ReconcilerPhase enum (3 phases)
│   ├── revision.go                             # Revision type (uint64 wrapper)
│   ├── scope.go                                # Scope enum (global, tenant)
│   ├── snapshot.go                             # Snapshot + EffectiveValue
│   └── target.go                               # Target (kind/scope/subject coordinate)
├── ports/                                      # Interface definitions (hex boundary)
│   ├── authorizer.go                           # Authorizer
│   ├── bundle_factory.go                       # BundleFactory + IncrementalBundleFactory
│   ├── changefeed.go                           # ChangeFeed + ChangeSignal
│   ├── history.go                              # HistoryStore + HistoryEntry + HistoryFilter
│   ├── identity.go                             # IdentityResolver
│   ├── reconciler.go                           # BundleReconciler (phased)
│   └── store.go                                # Store + WriteOp + ReadResult
├── registry/                                   # Thread-safe key definition registry
│   ├── registry.go                             # Registry interface + inMemoryRegistry
│   └── validation.go                           # Type validation with JSON coercion
├── service/                                    # Use-case orchestration
│   ├── escalation.go                           # Escalate() — strongest ApplyBehavior
│   ├── manager.go                              # Manager interface + ManagerConfig
│   ├── manager_helpers.go                      # Redaction, schema building
│   ├── manager_reads.go                        # GetConfigs, GetSettings, GetSchema, GetHistory, Resync
│   ├── manager_writes.go                       # PatchConfigs, PatchSettings, ApplyChangeSignal
│   ├── manager_writes_helpers.go               # Validation, escalation, snapshot preview
│   ├── snapshot_builder.go                     # 3-layer cascade: default → global → tenant
│   ├── supervisor.go                           # Supervisor interface + Reload lifecycle
│   └── supervisor_helpers.go                   # Build strategy, phase sort, resource adoption
├── bootstrap/                                  # Backend wiring from env/config
│   ├── backend.go                              # BackendFactory registry + NewBackendFromConfig
│   ├── classifier.go                           # IsBootstrapOnly / IsRuntimeManaged
│   ├── config.go                               # BootstrapConfig, PostgresBootstrapConfig, MongoBootstrapConfig
│   ├── defaults.go                             # Default table/collection names
│   ├── env.go                                  # LoadFromEnv() — SYSTEMPLANE_* env vars
│   ├── postgres_identifiers.go                 # Regex validation for PG identifiers
│   └── builtin/
│       └── backend.go                          # init()-registered factories (PG + Mongo)
├── adapters/
│   ├── store/
│   │   ├── secretcodec/
│   │   │   └── codec.go                        # AES-256-GCM encryption for secret values
│   │   ├── storetest/
│   │   │   └── contract.go                     # 15 shared contract tests (Store + History)
│   │   ├── postgres/                           # 3 tables, optimistic concurrency, pg_notify
│   │   │   ├── ddl.go                          # CREATE TABLE templates
│   │   │   ├── postgres.go                     # New(), NewFromDB() constructors
│   │   │   ├── store.go                        # Get(), Put()
│   │   │   ├── store_mutation.go               # applyOp, upsert, delete, insertHistory
│   │   │   ├── store_revision.go               # lock/read/update revision
│   │   │   ├── store_runtime.go                # encrypt/decrypt, notify, escalate
│   │   │   ├── json_decode.go                  # Integer-preserving JSON decoder
│   │   │   ├── history.go                      # ListHistory with dynamic builder
│   │   │   └── identifiers.go                  # qualify() helper
│   │   └── mongodb/                            # Sentinel revision doc, multi-doc txns
│   │       ├── mongodb.go                      # New(), ensureIndexes, txn support check
│   │       ├── store.go                        # Get(), Put() with sessions
│   │       ├── store_helpers.go                # applyOperation, encrypt, CRUD
│   │       ├── models.go                       # BSON models + normalization
│   │       └── history.go                      # ListHistory with BSON filter
│   ├── changefeed/
│   │   ├── debounce.go                         # Trailing-edge per-target debounce
│   │   ├── debounce_helpers.go                 # Timer management, jitter
│   │   ├── safe_handler.go                     # Panic-to-error conversion
│   │   ├── feedtest/
│   │   │   └── contract.go                     # 3 shared contract tests for ChangeFeed
│   │   ├── postgres/                           # LISTEN/NOTIFY, auto-reconnect, revision resync
│   │   │   ├── feed.go                         # Feed struct + Subscribe
│   │   │   ├── feed_subscribe.go               # subscribeLoop, reconnect, listenLoop
│   │   │   ├── feed_runtime.go                 # backoff, jitter, revision validation
│   │   │   └── options.go                      # WithReconnectBounds, WithRevisionSource
│   │   └── mongodb/                            # change_stream or poll mode
│   │       ├── feed.go                         # Feed struct + Subscribe
│   │       ├── feed_stream.go                  # subscribeChangeStream
│   │       ├── feed_poll.go                    # subscribePoll, pollRevisions
│   │       └── feed_bson.go                    # BSON helpers
│   └── http/
│       └── fiber/                              # 9 endpoints, DTOs, middleware
│           ├── handler.go                      # Handler struct, NewHandler, Mount()
│           ├── handler_configs.go              # Config CRUD + reload
│           ├── handler_settings.go             # Settings CRUD
│           ├── dto.go                          # All request/response DTOs
│           ├── middleware.go                   # requireAuth, settingsAuth
│           └── errors.go                       # Domain error → HTTP status mapping
├── swagger/                                    # Embedded OpenAPI spec for merge
│   └── ...                                     # swagger.MergeInto(baseSpec) ([]byte, error)
└── testutil/                                   # Test doubles
    ├── fake_store.go                           # In-memory Store with concurrency
    ├── fake_history.go                         # In-memory HistoryStore
    ├── fake_bundle.go                          # FakeBundle + FakeBundleFactory
    ├── fake_reconciler.go                      # Records all calls, configurable phase
    └── fake_incremental_bundle.go              # FakeIncrementalBundleFactory

Appendix B: Domain Model Reference

Pure value objects. No infrastructure dependencies.

Type Purpose Key Fields
Entry Persisted config override Kind, Scope, Subject, Key, Value any, Revision, UpdatedAt, UpdatedBy, Source
Kind Config vs Setting KindConfig ("config") or KindSetting ("setting")
Scope Visibility ScopeGlobal ("global") or ScopeTenant ("tenant")
Target Coordinate for a group of entries Kind + Scope + SubjectID; constructor NewTarget() validates
Revision Monotonic version counter uint64; RevisionZero = 0; methods: Next(), Uint64()
Actor Who made the change ID string
KeyDef Registry metadata per key Key, Kind, AllowedScopes, DefaultValue, ValueType, Validator ValidatorFunc, Secret, RedactPolicy, ApplyBehavior, MutableAtRuntime, Description, Group, Component
ReconcilerPhase Reconciler execution ordering PhaseStateSync (0), PhaseValidation (1), PhaseSideEffect (2)
Snapshot Immutable point-in-time view Configs, GlobalSettings, TenantSettings map[string]map[string]EffectiveValue, Revision, BuiltAt
EffectiveValue Resolved value with override info Key, Value, Default, Override, Source, Revision, Redacted
RuntimeBundle App-defined resource container Interface: Close(ctx) error
ApplyBehavior How changes propagate See Appendix F; has Strength() int (0-4 scale)
ValueType Type constraint "string", "int", "bool", "float", "object", "array"
RedactPolicy Secret handling RedactNone, RedactFull, RedactMask
BackendKind Storage backend BackendPostgres or BackendMongoDB
ValidatorFunc Custom value validation func(value any) error
ComponentNone Sentinel for no-rebuild keys "_none" — for pure business-logic keys

Snapshot accessor methods (nil-safe):

  • GetConfig(key), GetGlobalSetting(key), GetTenantSetting(tenantID, key)(EffectiveValue, bool)
  • ConfigValue(key, fallback), GlobalSettingValue(key, fallback), TenantSettingValue(tenantID, key, fallback)any

Sentinel errors (14):

var (
    ErrKeyUnknown          = errors.New("unknown configuration key")
    ErrValueInvalid        = errors.New("invalid configuration value")
    ErrRevisionMismatch    = errors.New("revision mismatch")
    ErrScopeInvalid        = errors.New("scope not allowed for this key")
    ErrPermissionDenied    = errors.New("permission denied")
    ErrReloadFailed        = errors.New("configuration reload failed")
    ErrKeyNotMutable       = errors.New("key is not mutable at runtime")
    ErrSnapshotBuildFailed = errors.New("snapshot build failed")
    ErrBundleBuildFailed   = errors.New("runtime bundle build failed")
    ErrBundleSwapFailed    = errors.New("runtime bundle swap failed")
    ErrReconcileFailed     = errors.New("bundle reconciliation failed")
    ErrNoCurrentBundle     = errors.New("no current runtime bundle")
    ErrSupervisorStopped   = errors.New("supervisor has been stopped")
    ErrRegistryRequired    = errors.New("registry is required")
)

Per-enum parse errors: ErrInvalidKind, ErrInvalidBackendKind, ErrInvalidScope, ErrInvalidApplyBehavior, ErrInvalidValueType, ErrInvalidReconcilerPhase.


Appendix C: Ports Interface Reference

Seven interfaces defining all external dependencies:

// Persistence — read/write config entries
type Store interface {
    Get(ctx context.Context, target domain.Target) (ReadResult, error)
    Put(ctx context.Context, target domain.Target, ops []WriteOp,
        expected domain.Revision, actor domain.Actor, source string) (domain.Revision, error)
}

type WriteOp struct { Key string; Value any; Reset bool }
type ReadResult struct { Entries []domain.Entry; Revision domain.Revision }

// Audit trail — change history
type HistoryStore interface {
    ListHistory(ctx context.Context, filter HistoryFilter) ([]HistoryEntry, error)
}

type HistoryEntry struct {
    Revision  domain.Revision
    Key       string
    Scope     string
    SubjectID string
    OldValue  any
    NewValue  any
    ActorID   string
    ChangedAt time.Time
}

type HistoryFilter struct { Kind, Scope, SubjectID, Key string; Limit, Offset int }

// Real-time change notifications
type ChangeFeed interface {
    Subscribe(ctx context.Context, handler func(ChangeSignal)) error  // blocks until ctx cancelled
}

type ChangeSignal struct {
    Target        domain.Target
    Revision      domain.Revision
    ApplyBehavior domain.ApplyBehavior
}

// Permission checking
type Authorizer interface {
    Authorize(ctx context.Context, permission string) error
}

// Identity extraction from request context
type IdentityResolver interface {
    Actor(ctx context.Context) (domain.Actor, error)
    TenantID(ctx context.Context) (string, error)
}

// Application-specific runtime dependency builder (FULL rebuild)
type BundleFactory interface {
    Build(ctx context.Context, snap domain.Snapshot) (domain.RuntimeBundle, error)
}

// INCREMENTAL rebuild — extends BundleFactory with component-granular rebuilds.
// When the Supervisor detects only a subset of components changed, it calls
// BuildIncremental instead of Build, reusing unchanged components.
type IncrementalBundleFactory interface {
    BundleFactory
    BuildIncremental(ctx context.Context, snap domain.Snapshot,
        previous domain.RuntimeBundle, prevSnap domain.Snapshot) (domain.RuntimeBundle, error)
}

// Side-effect applier when bundles change.
// Reconcilers are sorted by Phase before execution:
//   PhaseStateSync  → update shared state (ConfigManager, caches)
//   PhaseValidation → gates that can reject the change
//   PhaseSideEffect → external side effects (worker restarts)
type BundleReconciler interface {
    Name() string
    Phase() domain.ReconcilerPhase
    Reconcile(ctx context.Context, previous, candidate domain.RuntimeBundle,
              snap domain.Snapshot) error
}

Appendix D: Service Layer Reference

Registry (commons/systemplane/registry/)

Thread-safe in-memory registry of key definitions:

type Registry interface {
    Register(def domain.KeyDef) error
    MustRegister(def domain.KeyDef)         // panics — startup only
    Get(key string) (domain.KeyDef, bool)
    List(kind domain.Kind) []domain.KeyDef  // sorted by key name
    Validate(key string, value any) error   // nil value always valid (reset)
}

func New() Registry   // returns *inMemoryRegistry (RWMutex-protected)

Validation (validation.go):

  • validateValue(def, value) → type check + custom validator
  • JSON coercion: float64 without fractional part accepted as int; int/int64 widened to float
  • isObjectCompatible(value) uses reflect.Map; isArrayCompatible(value) uses reflect.Array/Slice
  • Nil values always pass validation (they mean "reset to default")

Manager

type Manager interface {
    GetConfigs(ctx context.Context) (ResolvedSet, error)
    GetSettings(ctx context.Context, subject Subject) (ResolvedSet, error)
    PatchConfigs(ctx context.Context, req PatchRequest) (WriteResult, error)
    PatchSettings(ctx context.Context, subject Subject, req PatchRequest) (WriteResult, error)
    GetConfigSchema(ctx context.Context) ([]SchemaEntry, error)
    GetSettingSchema(ctx context.Context) ([]SchemaEntry, error)
    GetConfigHistory(ctx context.Context, filter HistoryFilter) ([]HistoryEntry, error)
    GetSettingHistory(ctx context.Context, filter HistoryFilter) ([]HistoryEntry, error)
    ApplyChangeSignal(ctx context.Context, signal ChangeSignal) error
    Resync(ctx context.Context) error
}

ManagerConfig (constructor dependencies):

type ManagerConfig struct {
    Registry             registry.Registry
    Store                ports.Store
    History              ports.HistoryStore
    Supervisor           Supervisor
    Builder              *SnapshotBuilder

    // ConfigWriteValidator is called BEFORE persistence with a preview snapshot.
    // Return non-nil error to reject the write.
    ConfigWriteValidator func(ctx context.Context, snapshot domain.Snapshot) error

    // StateSync is called AFTER successful writes for live-read escalation.
    // Used to update ConfigManager atomically.
    StateSync            func(ctx context.Context, snapshot domain.Snapshot)
}

Key behaviors:

  • Read path: Uses supervisor's cached snapshot when available; falls back to builder
  • Write path: validate ops → preview snapshot (if ConfigWriteValidator configured) → escalate → store.Put → apply escalation
  • Escalation application: Maps strongest ApplyBehavior to supervisor method:
    • ApplyBootstrapOnly → no-op
    • ApplyLiveReadPublishSnapshot + StateSync callback
    • ApplyWorkerReconcileReconcileCurrent
    • ApplyBundleRebuild / ApplyBundleRebuildAndReconcile → full Reload
  • Redaction: All read results and history entries are redacted per RedactPolicy/Secret flag
  • Masking: RedactMask shows last 4 runes; RedactFull and Secret=true show "****"

Supporting types:

  • Subject { Scope, SubjectID } — settings target
  • PatchRequest { Ops []WriteOp, ExpectedRevision, Actor, Source } — write input
  • WriteResult { Revision } — write output
  • ResolvedSet { Values map[string]EffectiveValue, Revision } — read output
  • SchemaEntry { Key, Kind, AllowedScopes, ValueType, DefaultValue, MutableAtRuntime, ApplyBehavior, Secret, RedactPolicy, Description, Group } — schema metadata

Supervisor

type Supervisor interface {
    Current() domain.RuntimeBundle
    Snapshot() domain.Snapshot
    PublishSnapshot(ctx context.Context, snap domain.Snapshot, reason string) error
    ReconcileCurrent(ctx context.Context, snap domain.Snapshot, reason string) error
    Reload(ctx context.Context, reason string, extraTenantIDs ...string) error
    Stop(ctx context.Context) error
}

SupervisorConfig:

type SupervisorConfig struct {
    Builder     *SnapshotBuilder
    Factory     ports.BundleFactory
    Reconcilers []ports.BundleReconciler

    // Observer is invoked AFTER each successful reload with structured info
    // about the build strategy (full vs incremental).
    Observer func(ReloadEvent)
}

type ReloadEvent struct {
    Strategy BuildStrategy              // "full" or "incremental"
    Reason   string                     // caller-supplied (e.g., "changefeed-signal")
    Snapshot domain.Snapshot
    Bundle   domain.RuntimeBundle
}

Reload lifecycle (the heart of the system):

  1. Build snapshot: builder.BuildFull(ctx, tenantIDs...)
  2. Build bundle: Try incremental first (if factory implements IncrementalBundleFactory and previous exists), fall back to full
  3. Reconcile BEFORE commit: Run all reconcilers against candidate while previous is still active (prevents state corruption on failure)
  4. Atomic swap: snapshot.Store(&snap) + bundle.Store(&holder{candidate})
  5. Resource adoption: If candidate implements resourceAdopter, call AdoptResourcesFrom(previous) to nil-out transferred pointers
  6. Observer callback: Notify host application (e.g., update bundleState, swap loggers)
  7. Close previous: previous.Close() — only tears down REPLACED components since transferred pointers are nil

Hidden interfaces (implement on your bundle for advanced patterns):

  • resourceAdopter { AdoptResourcesFrom(previous RuntimeBundle) } — called after commit to mark transferred resources
  • rollbackDiscarder { Discard(ctx) error } — called on failed candidate instead of Close()

SnapshotBuilder

type SnapshotBuilder struct { registry Registry; store Store }

func NewSnapshotBuilder(reg, store) (*SnapshotBuilder, error)

func (b) BuildConfigs(ctx) (map[string]EffectiveValue, Revision, error)
func (b) BuildGlobalSettings(ctx) (map[string]EffectiveValue, Revision, error)
func (b) BuildSettings(ctx, Subject) (map[string]EffectiveValue, Revision, error)
func (b) BuildFull(ctx, tenantIDs ...string) (Snapshot, error)

Override cascade (3-layer for tenant settings):

  1. initDefaults(defs) → populate from KeyDef.DefaultValue
  2. applyOverrides(effective, entries, source) → overlay store entries
  3. Tenant path: registry defaults → global override → per-tenant override

Escalation

func Escalate(reg Registry, ops []WriteOp) (ApplyBehavior, []string, error)
  • Returns strongest ApplyBehavior across all ops (by Strength() 0-4)
  • Rejects ApplyBootstrapOnly and MutableAtRuntime=false keys
  • Rejects duplicate keys in batch
  • Empty batch → ApplyLiveRead
  • Returns the list of keys that drove the escalation

Appendix E: Bootstrap Configuration

File Purpose
config.go BootstrapConfig, PostgresBootstrapConfig, MongoBootstrapConfig, SecretStoreConfig
backend.go BackendFactory registry, RegisterBackendFactory(), NewBackendFromConfig()BackendResources{Store, History, ChangeFeed, Closer}
builtin/backend.go init() registers Postgres + MongoDB factories
env.go LoadFromEnv() reads SYSTEMPLANE_* env vars
classifier.go IsBootstrapOnly(def) — `!MutableAtRuntime
defaults.go Default table/collection names
postgres_identifiers.go Regex validation for PG identifiers (^[a-z_][a-z0-9_]{0,62}$)

Environment variables for standalone systemplane backend:

Variable Default Description
SYSTEMPLANE_BACKEND postgres or mongodb
SYSTEMPLANE_POSTGRES_DSN PostgreSQL DSN (falls back to app's PG DSN)
SYSTEMPLANE_POSTGRES_SCHEMA system Schema name
SYSTEMPLANE_POSTGRES_ENTRIES_TABLE runtime_entries Entries table
SYSTEMPLANE_POSTGRES_HISTORY_TABLE runtime_history History table
SYSTEMPLANE_POSTGRES_REVISION_TABLE runtime_revisions Revisions table
SYSTEMPLANE_POSTGRES_NOTIFY_CHANNEL systemplane_changes PG LISTEN/NOTIFY channel
SYSTEMPLANE_SECRET_MASTER_KEY AES-256-GCM master key (32 bytes, raw or base64)
SYSTEMPLANE_MONGODB_URI MongoDB connection URI
SYSTEMPLANE_MONGODB_DATABASE systemplane MongoDB database
SYSTEMPLANE_MONGODB_ENTRIES_COLLECTION runtime_entries Entries collection
SYSTEMPLANE_MONGODB_HISTORY_COLLECTION runtime_history History collection
SYSTEMPLANE_MONGODB_WATCH_MODE change_stream change_stream or poll
SYSTEMPLANE_MONGODB_POLL_INTERVAL_SEC 5 Poll interval (poll mode only)

Appendix F: Apply Behavior Taxonomy

Every config key MUST be classified with exactly one ApplyBehavior:

ApplyBehavior Code Constant Strength Runtime Effect Use When
bootstrap-only domain.ApplyBootstrapOnly 4 Immutable after startup. Never changes. Server listen address, TLS, auth enable, telemetry endpoints
bundle-rebuild+worker-reconcile domain.ApplyBundleRebuildAndReconcile 3 Full bundle swap: new infra clients AND worker restart Worker enable/disable (needs new connections + restart)
bundle-rebuild domain.ApplyBundleRebuild 2 Full bundle swap: new PG/Redis/RMQ/S3 clients Connection strings, pool sizes, credentials
worker-reconcile domain.ApplyWorkerReconcile 1 Reconciler restarts affected workers Worker intervals, scheduler periods
live-read domain.ApplyLiveRead 0 Read from snapshot on every request. Zero cost. Rate limits, timeouts, cache TTLs — anything read per-request

Strength determines escalation: when a PATCH contains multiple keys with different behaviors, Escalate() picks the strongest (highest number). If any key is ApplyBootstrapOnly or MutableAtRuntime=false, the entire write is rejected.

Classification decision tree:

Is this key needed BEFORE the systemplane itself can start?
  YES → ApplyBootstrapOnly (server address, auth enable, telemetry)
  NO ↓

Can this key be read per-request from a snapshot without side effects?
  YES → ApplyLiveRead (rate limits, timeouts, TTLs)
  NO ↓

Does changing this key require rebuilding infrastructure clients?
  YES → Does it ALSO require restarting background workers?
    YES → ApplyBundleRebuildAndReconcile (worker enable + storage changes)
    NO  → ApplyBundleRebuild (DB connections, pool sizes, credentials)
  NO ↓

Does changing this key require restarting background workers?
  YES → ApplyWorkerReconcile (worker intervals, scheduler periods)
  NO  → ApplyLiveRead (safe default for read-only configs)

Appendix G: Screening Methodology

Before implementing, screen the target service to build the key inventory:

G.1 Identify All Configuration Sources

# In the target repo:

# 1. Find the Config struct
grep -rn 'type Config struct' internal/ --include='*.go'

# 2. Find all envDefault tags
grep -rn 'envDefault:' internal/ --include='*.go' | sort

# 3. Find env var reads outside Config struct
grep -rn 'os.Getenv\|viper.Get' internal/ --include='*.go' | grep -v _test.go

# 4. Find .env files
find . -name '.env*' -o -name '*.yaml.example' | head -20

# 5. Find file-based config loading
grep -rn 'viper\.\|yaml.Unmarshal\|json.Unmarshal.*config' internal/ --include='*.go'

G.2 Classify Infrastructure vs Application Config

Infrastructure (stays as env-var with defaults in code):

  • Database connection strings (PG host, port, user, password) → Component: "postgres"
  • Redis connection (host, master name, password) → Component: "redis"
  • RabbitMQ connection (URI, host, port, user, password) → Component: "rabbitmq"
  • Object storage (endpoint, bucket, credentials) → Component: "s3"
  • These become ApplyBundleRebuild keys in systemplane

Bootstrap-Only (env-var, immutable after startup):

  • Server listen address and port → Component: ComponentNone (not mutable)
  • TLS certificate paths
  • Auth enable/disable
  • Auth service address
  • Telemetry endpoints (OTEL collector)
  • These become ApplyBootstrapOnly keys

Application Runtime (hot-reloadable):

  • Rate limits → Component: ComponentNone + ApplyLiveRead
  • Worker intervals and enable/disable flags → Component: ComponentNone + ApplyWorkerReconcile
  • Timeouts (webhook, health check) → Component: ComponentNone + ApplyLiveRead
  • Feature flags → Component: ComponentNone + ApplyLiveRead
  • Cache TTLs → Component: ComponentNone + ApplyLiveRead

G.3 Identify Runtime Dependencies (Bundle Candidates)

List all infrastructure clients that the service creates at startup:

# Find client constructors
grep -rn 'libPostgres.New\|libRedis.New\|libRabbitmq.New\|storage.New' internal/ --include='*.go'

# Find connection pools
grep -rn 'sql.Open\|pgx\|redis.New\|amqp.Dial' internal/ --include='*.go'

Each of these becomes a field in the InfraBundle and a component name in keyComponentMap.

G.4 Identify Background Workers

# Find worker patterns
grep -rn 'ticker\|time.NewTicker\|cron\|worker\|scheduler' internal/ --include='*.go' | grep -v _test.go

Each worker with configurable intervals becomes a WorkerReconciler candidate.

G.5 Generate the Key Inventory

Create a table with columns:

  • Key name (dotted: postgres.primary_host)
  • Current env var (POSTGRES_HOST)
  • Default value
  • Type (string, int, bool, float)
  • Kind (config or setting)
  • Scope (global or tenant)
  • ApplyBehavior
  • Component (postgres, redis, rabbitmq, s3, http, logger, _none)
  • Secret (yes/no)
  • MutableAtRuntime (yes/no)
  • Group
  • Validator (if any)

Appendix H: Testing Patterns

H.1 Key Registration Tests

func TestRegister{Service}Keys_AllKeysValid(t *testing.T) {
    reg := registry.New()
    err := Register{Service}Keys(reg)
    require.NoError(t, err)

    configs := reg.List(domain.KindConfig)
    settings := reg.List(domain.KindSetting)
    assert.Greater(t, len(configs)+len(settings), 0)
}

func TestRegister{Service}Keys_DefaultsMatchConfig(t *testing.T) {
    reg := registry.New()
    _ = Register{Service}Keys(reg)
    defaults := defaultConfig()

    for _, def := range reg.List(domain.KindConfig) {
        // Compare def.DefaultValue against defaults struct field
    }
}

H.2 Bundle Factory Tests

func TestBundleFactory_Build_Success(t *testing.T) {
    snap := buildTestSnapshot(t)
    factory := New{Service}BundleFactory(testBootstrapConfig())

    bundle, err := factory.Build(context.Background(), snap)
    require.NoError(t, err)
    defer bundle.Close(context.Background())

    b := bundle.(*{Service}Bundle)
    assert.NotNil(t, b.Infra.Postgres)
    assert.NotNil(t, b.Infra.Redis)
}

func TestBundleFactory_BuildIncremental_ReusesUnchangedComponents(t *testing.T) {
    snap1 := buildTestSnapshot(t)
    snap2 := modifySnapshot(snap1, "rate_limit.max", 500) // only changes _none component

    factory := New{Service}BundleFactory(testBootstrapConfig())
    ctx := context.Background()
    prev, _ := factory.Build(ctx, snap1)

    candidate, _ := factory.BuildIncremental(ctx, snap2, prev, snap1)
    b := candidate.(*{Service}Bundle)

    // Postgres should be reused (not owned by candidate)
    assert.False(t, b.ownsPostgres)
}

H.3 Reconciler Tests

func TestWorkerReconciler_AppliesConfig(t *testing.T) {
    wm := NewTestWorkerManager()
    reconciler := NewWorkerReconciler(wm)

    snap := buildModifiedSnapshot(t, "export_worker.enabled", true)
    err := reconciler.Reconcile(context.Background(), nil, nil, snap)
    require.NoError(t, err)

    assert.True(t, wm.LastAppliedConfig().ExportWorkerEnabled)
}

H.4 Contract Tests (Backend-Agnostic)

Run the storetest and feedtest contract suites against your backend:

func TestPostgresStore_ContractSuite(t *testing.T) {
    store, history := setupPostgresStore(t) // testcontainers
    storetest.RunAll(t, /* factory that returns store+history */)
}

func TestPostgresChangeFeed_ContractSuite(t *testing.T) {
    feed := setupPostgresFeed(t) // testcontainers
    feedtest.RunAll(t, /* factory that returns store+feed */)
}

Store contract tests (15):

  • GetEmptyTarget, PutSingleOp, PutBatch, OptimisticConcurrency, ResetOp
  • NilValueOp, EmptyBatchIsNoOp, TypedNilValueOp, PutPreservesOtherKeys
  • RevisionMonotonicallyIncreasing, ConcurrentPuts
  • HistoryRecording, BatchHistoryConsistency, HistoryFiltering, HistoryPagination

Feed contract tests (3):

  • SubscribeReceivesSignal, ContextCancellationStops, MultipleSignals

H.5 ConfigWriteValidator Tests

func TestConfigWriteValidator_RejectsInvalidConfig(t *testing.T) {
    // Test that production guards prevent persisting invalid config
    snap := buildSnapshot(t)
    snap.Configs["rate_limit.enabled"] = ev(false) // can't disable in production

    err := validator(context.Background(), snap)
    assert.Error(t, err)
}

Appendix I: Operational Guide

I.1 For Operators: What Changes

Before After
Edit .env + restart PATCH /v1/system/configs (no restart)
Edit YAML + wait for fsnotify PATCH /v1/system/configs (instant)
No audit trail GET /v1/system/configs/history
No schema discovery GET /v1/system/configs/schema
No concurrency protection If-Match / ETag headers
Manual rollback Change feed propagates across replicas
Full restart for any change Only ApplyBootstrapOnly keys need restart

I.2 Bootstrap-Only Keys (Require Restart)

Document in config/.config-map.example:

# {Service} — Bootstrap-Only Configuration (requires restart)
#
# These are the ONLY settings that require a container/pod restart.
# Everything else is hot-reloadable via:
#
#   GET  /v1/system/configs          — view current runtime config
#   PATCH /v1/system/configs         — change any runtime-managed key
#   GET  /v1/system/configs/schema   — see all keys, types, and mutability
#   GET  /v1/system/configs/history  — audit trail of changes

ENV_NAME=development
SERVER_ADDRESS=:8080
AUTH_ENABLED=false
ENABLE_TELEMETRY=false
# ... etc

I.3 Docker Compose (Zero-Config)

services:
  myservice:
    build: .
    ports:
      - "${SERVER_PORT:-8080}:8080"
    environment:
      - POSTGRES_HOST=${POSTGRES_HOST:-postgres}
      - POSTGRES_PORT=${POSTGRES_PORT:-5432}
      - REDIS_HOST=${REDIS_HOST:-redis}
    # NO env_file directive — defaults baked into binary
    depends_on:
      postgres:
        condition: service_healthy

I.4 Systemplane Backend Config

By default, the systemplane reuses the application's primary PostgreSQL connection. Override with SYSTEMPLANE_* env vars for a separate backend:

# Use app's Postgres (default — no extra config needed)
# The init function builds the DSN from the app's POSTGRES_* env vars

# Or use a dedicated backend:
SYSTEMPLANE_BACKEND=postgres
SYSTEMPLANE_POSTGRES_DSN=postgres://user:pass@host:5432/systemplane?sslmode=require

# Secret encryption (REQUIRED — rejected in production without it):
SYSTEMPLANE_SECRET_MASTER_KEY=<32-byte key, raw or base64>

I.5 Graceful Degradation

If the systemplane fails to initialize, the service continues without it:

  • Config values from env vars still work
  • No runtime mutation API available
  • No hot-reload capability
  • Workers run with static config

This is by design — the service never fails to start due to systemplane issues.

I.6 HTTP API Endpoints

After migration, the service exposes these endpoints:

Method Path Description Auth Permission
GET /v1/system/configs View all resolved config values system/configs:read
PATCH /v1/system/configs Update config values (with If-Match for concurrency) system/configs:write
GET /v1/system/configs/schema View all key definitions (types, defaults, mutability) system/configs/schema:read
GET /v1/system/configs/history Audit trail of config changes system/configs/history:read
POST /v1/system/configs/reload Force a full reload system/configs/reload:write
GET /v1/system/settings View resolved settings (?scope=global|tenant) system/settings:read
PATCH /v1/system/settings Update settings system/settings:write
GET /v1/system/settings/schema View setting key definitions system/settings/schema:read
GET /v1/system/settings/history Settings audit trail scope-dependent

Settings routes have an extra settingsScopeAuthorization middleware that elevates to system/settings/global:{action} when ?scope=global is queried.

PATCH Request Format:

PATCH /v1/system/configs
If-Match: "42"
Content-Type: application/json

{
  "values": {
    "rate_limit.max": 200,
    "rate_limit.expiry_sec": 120,
    "export_worker.enabled": false
  }
}

PATCH Response Format:

HTTP/1.1 200 OK
ETag: "43"

{
  "revision": 43
}

Schema Response Format:

{
  "keys": [
    {
      "key": "rate_limit.max",
      "kind": "config",
      "valueType": "int",
      "defaultValue": 100,
      "mutableAtRuntime": true,
      "applyBehavior": "live-read",
      "secret": false,
      "description": "Maximum requests per window",
      "group": "rate_limit",
      "allowedScopes": ["global"]
    }
  ]
}

HTTP Error Mapping:

Domain Error HTTP Status Code
ErrKeyUnknown 400 system_key_unknown
ErrValueInvalid 400 system_value_invalid
ErrKeyNotMutable 400 system_key_not_mutable
ErrScopeInvalid 400 system_scope_invalid
ErrRevisionMismatch 409 system_revision_mismatch
ErrPermissionDenied 403 system_permission_denied
ErrReloadFailed 500 system_reload_failed
ErrSupervisorStopped 503 system_unavailable

I.7 Adapters Reference

Adapter Location Key Feature
PostgreSQL Store adapters/store/postgres/ 3 tables + indexes, optimistic concurrency, pg_notify, integer-preserving JSON decode
MongoDB Store adapters/store/mongodb/ Sentinel revision doc (__revision_meta__), multi-doc transactions, requires replica set
Secret Codec adapters/store/secretcodec/ AES-256-GCM with random nonce, AAD = "kind|scope|subject|key", envelope format {__systemplane_secret_v: 1, alg, nonce, ciphertext}
PostgreSQL ChangeFeed adapters/changefeed/postgres/ LISTEN/NOTIFY, exponential backoff reconnect, revision resync on reconnect
MongoDB ChangeFeed adapters/changefeed/mongodb/ Change stream or poll mode, auto-reconnect, revision-jump escalation
DebouncedFeed adapters/changefeed/debounce.go Per-target trailing-edge debounce (default 100ms + 50ms jitter), escalates to strongest ApplyBehavior in window
SafeInvokeHandler adapters/changefeed/safe_handler.go Catches handler panics, returns ErrHandlerPanic
Fiber HTTP adapters/http/fiber/ 9 endpoints, DTOs, If-Match/ETag concurrency, middleware, domain→HTTP error mapping
Store Contract Tests adapters/store/storetest/ 15 backend-agnostic tests: CRUD, concurrency, reset, history, pagination
Feed Contract Tests adapters/changefeed/feedtest/ 3 backend-agnostic tests: signal receipt, cancellation, multiple signals

PostgreSQL Store DDL (created automatically):

  • runtime_entries: PK (kind, scope, subject, key), JSONB value, BIGINT revision
  • runtime_history: BIGSERIAL id, old_value/new_value JSONB, actor_id, changed_at
  • runtime_revisions: PK (kind, scope, subject), revision counter, apply_behavior

I.8 Test Utilities

Fake Implements Key Features
FakeStore ports.Store In-memory, optimistic concurrency, Seed() for pre-population
FakeHistoryStore ports.HistoryStore In-memory, newest-first, Append()/AppendForKind()
FakeBundle / FakeBundleFactory RuntimeBundle / BundleFactory Tracks Close state, SetError(), CallCount()
FakeReconciler BundleReconciler Configurable phase, records all ReconcileCalls, SetError()
FakeIncrementalBundleFactory IncrementalBundleFactory Embeds FakeBundleFactory + IncrementalBuildFunc, IncrementalCallCount()

Appendix J: Matcher Service Reference

The Matcher service registers ~130 keys across 20 groups, split into 9 focused sub-files. Here's the breakdown by group and ApplyBehavior:

Group Keys BootstrapOnly BundleRebuild LiveRead WorkerReconcile Rebuild+Reconcile
app 2 1 1 - - -
server 8 4 4 - - -
tenancy 11 - 11 - - -
postgres 19 - 18 1 - -
redis 12 - 12 - - -
rabbitmq 8 - 8 - - -
auth 3 3 - - - -
swagger 3 - 3 - - -
telemetry 7 7 - - - -
rate_limit 7 - - 7 - -
infrastructure 2 - 1 1 - -
idempotency 3 1 2 - - -
callback_rate_limit 1 - - 1 - -
fetcher 9 - 4 - 3 2
deduplication 1 - - 1 - -
object_storage 6 - 6 - - -
export_worker 4 - - 1 1 2
webhook 1 - - 1 - -
cleanup_worker 4 - - - 2 2
scheduler 1 - - - 1 -
archival 12 - - 1 3 8

Secrets: ~9 keys with RedactFull policy (passwords, tokens, certificates, access keys).

Components referenced: postgres, redis, rabbitmq, s3, http, logger, _none.

Key Sub-File Organization (Reference)

Sub-File Groups Covered Key Count
systemplane_keys_app_server.go app, server ~11
systemplane_keys_tenancy.go tenancy ~11
systemplane_keys_postgres.go postgres ~19
systemplane_keys_messaging.go redis, rabbitmq ~20
systemplane_keys_runtime_http.go auth, swagger, telemetry, rate_limit ~17
systemplane_keys_runtime_services.go infrastructure, idempotency, callback_rate_limit, fetcher ~14
systemplane_keys_storage_export.go deduplication, object_storage, export_worker ~11
systemplane_keys_workers.go webhook, cleanup_worker ~5
systemplane_keys_archival.go scheduler, archival ~12

Appendix K: Quick Reference Commands

# LOCAL DEV ONLY — requires AUTH_ENABLED=false

# View current runtime config
curl -s http://localhost:4018/v1/system/configs | jq

# View schema (all keys, types, mutability)
curl -s http://localhost:4018/v1/system/configs/schema | jq

# Change a runtime key
curl -X PATCH http://localhost:4018/v1/system/configs \
  -H 'Content-Type: application/json' \
  -H 'If-Match: "current-revision"' \
  -d '{"values": {"rate_limit.max": 200}}'

# View change history
curl -s http://localhost:4018/v1/system/configs/history | jq

# Force full reload
curl -X POST http://localhost:4018/v1/system/configs/reload

# View settings
curl -s http://localhost:4018/v1/system/settings?scope=global | jq

# Change a setting
curl -X PATCH http://localhost:4018/v1/system/settings \
  -H 'Content-Type: application/json' \
  -H 'If-Match: "current-revision"' \
  -d '{"scope": "global", "values": {"feature.enabled": true}}'

Appendix L: Files to Create per Service

Core Systemplane Wiring

File Purpose Matcher Reference
systemplane_init.go Init function with 11-step boot + change feed start internal/bootstrap/systemplane_init.go
systemplane_mount.go HTTP route registration + swagger merge internal/bootstrap/systemplane_mount.go

Key Definitions (split by group)

File Purpose Matcher Reference
systemplane_keys.go Orchestrator: Register{Service}Keys() + {service}KeyDefs() internal/bootstrap/systemplane_keys.go
systemplane_keys_{group}.go Per-group key definitions internal/bootstrap/systemplane_keys_{group}.go
systemplane_keys_validation.go Validator functions used by KeyDef.Validator internal/bootstrap/systemplane_keys_validation.go
systemplane_keys_helpers.go concatKeyDefs() utility internal/bootstrap/systemplane_keys_helpers.go

Bundle + Factory

File Purpose Matcher Reference
systemplane_bundle.go Bundle struct + Close + AdoptResourcesFrom (ownership tracking) internal/bootstrap/systemplane_bundle.go
systemplane_factory.go BundleFactory + IncrementalBundleFactory (full + incremental) internal/bootstrap/systemplane_factory.go
systemplane_factory_infra.go Per-component builders: buildPostgres, buildRedis, buildRabbitMQ, buildS3 internal/bootstrap/systemplane_factory_infra.go

Reconcilers

File Purpose Matcher Reference
systemplane_reconciler_http.go HTTP policy validation (PhaseValidation) internal/bootstrap/systemplane_reconciler_http.go
systemplane_reconciler_publishers.go RabbitMQ publisher staging (PhaseValidation) internal/bootstrap/systemplane_reconciler_publishers.go
systemplane_reconciler_worker.go Worker restart (PhaseSideEffect) internal/bootstrap/systemplane_reconciler_worker.go

Identity + Authorization

File Purpose Matcher Reference
systemplane_identity.go JWT → Actor bridge internal/bootstrap/systemplane_identity.go
systemplane_authorizer.go Permission mapping internal/bootstrap/systemplane_authorizer.go

Config Manager Integration

File Purpose Matcher Reference
config_manager_systemplane.go Snapshot → Config hydration internal/bootstrap/config_manager_systemplane.go
config_manager_seed.go Env → Store one-time seed internal/bootstrap/config_manager_seed.go
config_manager_helpers.go Type-safe value comparison internal/bootstrap/config_manager_helpers.go
config_validation.go Production config guards internal/bootstrap/config_validation.go

Runtime Integration

File Purpose Matcher Reference
active_bundle_state.go Thread-safe live-read accessor for current bundle internal/bootstrap/active_bundle_state.go
config/.config-map.example Bootstrap-only key reference (operators) config/.config-map.example

Files to Delete

File Reason
config/.env.example Replaced by code defaults + .config-map.example
config/*.yaml.example No more YAML config
Config API handlers (old) Replaced by systemplane HTTP adapter
Config file watcher Replaced by change feed
Config audit publisher (old) Replaced by systemplane history
Config YAML loader No more YAML

Files to Modify

File Change
config_loading.go Remove YAML loading, keep env-only
config_defaults.go Derive from {service}KeyDefs() via configFromSnapshot(defaultSnapshotFromKeyDefs(...))
config_manager.go Add UpdateFromSystemplane(), enterSeedMode(), atomic.Pointer[Config] for lock-free reads
init.go Wire systemplane init after workers, before HTTP mount; add reload observer callback
service.go Add systemplane shutdown sequence (5-step)
docker-compose.yml Remove env_file, inline defaults with ${VAR:-default}
Makefile Remove set-env, clear-envs targets

Appendix M: Canonical Key Catalog

The canonical key catalog is defined in lib-commons/commons/systemplane/catalog/. Products MUST match these names, tiers, and components for shared infrastructure keys.

Naming Conventions

Convention Rule Example
SSL mode ssl_mode (with underscore) postgres.primary_ssl_mode
Connection count Plural conns postgres.max_open_conns, redis.min_idle_conns
CORS cors.* namespace (NOT server.cors_*) cors.allowed_origins
RabbitMQ connection rabbitmq.url (NOT uri) rabbitmq.url
Timeout unit suffix MANDATORY _ms or _sec redis.read_timeout_ms, rate_limit.expiry_sec
Size unit suffix MANDATORY _bytes server.body_limit_bytes

Tier Classification Standard

Config Category Canonical Tier Rationale
PG pool tuning (max_open_conns, etc.) LiveRead Go's database/sql supports SetMaxOpenConns() at runtime
CORS settings LiveRead Middleware reads from snapshot per-request
Log level LiveRead Use zap.AtomicLevel.SetLevel()
Migration path BootstrapOnly Migrations only run at startup
Body limit BootstrapOnly Set at Fiber server initialization
DB connection strings BundleRebuild Requires new connection pool
Redis connection BundleRebuild Requires new Redis client
Worker enable/disable BundleRebuildAndReconcile Needs new connections + worker restart
Worker intervals WorkerReconcile Needs worker restart only
Rate limits, timeouts, TTLs LiveRead Read per-request from snapshot

Enforcement

Products run catalog.ValidateKeyDefs() in their test suite. Any mismatch is a test failure that blocks CI.


Appendix N: Environment Variable Convention

Infrastructure Prefix Examples
PostgreSQL POSTGRES_* POSTGRES_HOST, POSTGRES_PORT, POSTGRES_PASSWORD
Redis REDIS_* REDIS_HOST, REDIS_PASSWORD, REDIS_DB
RabbitMQ RABBITMQ_* RABBITMQ_URL, RABBITMQ_EXCHANGE
Auth PLUGIN_AUTH_* PLUGIN_AUTH_ENABLED, PLUGIN_AUTH_ADDRESS
Telemetry OTEL_* / ENABLE_TELEMETRY OTEL_RESOURCE_SERVICE_NAME
Server SERVER_* SERVER_ADDRESS, SERVER_TLS_CERT_FILE

PROHIBITED: DB_HOST, DB_PORT, DB_USER, DB_PASSWORD, DB_NAME — use POSTGRES_* prefix. PROHIBITED: AUTH_ENABLED without PLUGIN_ prefix — use PLUGIN_AUTH_ENABLED.


Appendix O: Unit Suffix Standard

All config keys with dimensional values MUST include a unit suffix:

Dimension Suffix Examples
Time (milliseconds) _ms redis.read_timeout_ms, rabbitmq.publish_timeout_ms
Time (seconds) _sec rate_limit.expiry_sec, auth.cache_ttl_sec
Time (minutes) _mins postgres.conn_max_lifetime_mins
Size (bytes) _bytes server.body_limit_bytes
Count Plural noun postgres.max_open_conns, redis.min_idle_conns

PROHIBITED: Dimensionless timeout keys like webhook.timeout — must be webhook.timeout_ms or webhook.timeout_sec. PROHIBITED: Mixed units — if one timeout in a group uses _ms, all timeouts in that group should use _ms.