mirror of https://github.com/LerianStudio/ring synced 2026-04-21 21:47:49 +00:00

Jefferson Rodrigues b6e29134bc

docs(multi-tenant): remove MULTI_TENANT_CONNECTIONS_CHECK_INTERVAL_SEC env var

Moved out of the Multi-Tenant Standard to be referenced inside Ring Agents and Skills. Updated canonical env var count from 14 to 13 across table, .env example, Config struct, checklist, canonical file map, summary, and M2M plugin section.

X-Lerian-Ref: 0x1

2026-04-17 11:20:57 -03:00

116 KiB

Raw Blame History

Go Standards - Multi-Tenant

Module: multi-tenant.md | Sections: §27 | Parent: index.md

This module covers multi-tenant patterns with Tenant Manager.

#	Section	Description
1	Multi-Tenant Patterns (CONDITIONAL)	Configuration, Tenant Manager API, middleware, context injection, repository adaptation
1a	Generic TenantMiddleware (Standard Pattern)	Single-module services (CRM, plugins, reporter)
1b	Multi-module middleware (TenantMiddleware with multi-module)	Multi-module unified services (midaz ledger)
2	Tenant Isolation Verification (⚠️ CONDITIONAL)	Database-per-tenant verification, context-based connection checks
3	Route-Level Auth-Before-Tenant Ordering (MANDATORY)	Auth MUST validate JWT before tenant middleware calls Tenant Manager API
24	Multi-Tenant Message Queue Consumers	Multi-tenant consumer initialization, on-demand connection, exponential backoff, cache invalidation, consumer lifecycle (StopConsumer)
25	M2M Credentials via Secret Manager	AWS Secrets Manager integration for service-to-service authentication per tenant
26	Service Authentication (MANDATORY)	API key authentication for tenant-manager /settings endpoint via X-API-Key header

Multi-Tenant Patterns (CONDITIONAL)

CONDITIONAL: Only implement if MULTI_TENANT_ENABLED=true is required for your service.

HARD GATE: Canonical Model Compliance

Existence ≠ Compliance. A service that has "some multi-tenant code" is NOT considered multi-tenant unless every component matches the canonical patterns defined in this document exactly.

MUST replace multi-tenant implementations that use custom middleware, manual DB switching, non-standard env var names, or any mechanism other than the lib-commons v4 tenant-manager sub-packages — they are non-compliant. Not patched, not adapted, replaced.

The only valid multi-tenant implementation uses:

tenantId from JWT via TenantMiddleware with WithPG/WithMB options (from lib-commons/v4/commons/tenant-manager/middleware), registered per-route using a local WhenEnabled helper
tmcore.GetPGContext(ctx) / tmcore.GetPGContext(ctx, module) / tmcore.GetMBContext(ctx) / tmcore.GetMBContext(ctx, module) for database resolution (from lib-commons/v4/commons/tenant-manager/core)
valkey.GetKeyContext for Redis key prefixing (from lib-commons/v4/commons/tenant-manager/valkey)
s3.GetS3KeyStorageContext for S3 key prefixing (from lib-commons/v4/commons/tenant-manager/s3)
tmrabbitmq.Manager for RabbitMQ vhost isolation (from lib-commons/v4/commons/tenant-manager/rabbitmq)
The 13 canonical MULTI_TENANT_* environment variables with correct names and defaults
client.WithCircuitBreaker on the Tenant Manager HTTP client
client.WithServiceAPIKey on the Tenant Manager HTTP client for /settings endpoint authentication
pgManager handles settings revalidation internally via WithConnectionsCheckInterval — PostgreSQL only, MongoDB excluded

MUST correct any deviation from these patterns before the service can be considered multi-tenant.

Any file outside this canonical set that claims to handle multi-tenant logic (custom tenant resolvers, manual pool managers, wrapper middleware, etc.) is non-compliant and MUST NOT be considered part of the multi-tenant implementation. Only files following the patterns below are valid.

Canonical File Map

These are the only files that require multi-tenant changes. The exact paths follow the standard Go project layout used across Lerian services. Files not listed here MUST NOT contain multi-tenant logic.

Always modified (every service):

File	Gate	What Changes
`go.mod`	2	lib-commons v4, lib-auth v2
`internal/bootstrap/config.go`	3	13 canonical `MULTI_TENANT_*` env vars in Config struct
`internal/bootstrap/service.go` (or equivalent init file)	4	Conditional initialization: Tenant Manager client, connection managers, middleware creation. Branch on `cfg.MultiTenantEnabled`
`internal/bootstrap/routes.go` (or equivalent router file)	4	Per-route composition via `WhenEnabled(ttHandler)` — auth validates JWT before tenant resolves DB. Each project implements the `WhenEnabled` helper locally. See Route-Level Auth-Before-Tenant Ordering

Per detected database/storage (Gate 5):

File Pattern	Stack	What Changes
`internal/adapters/postgres/*/.postgresql.go`	PostgreSQL	`r.connection.GetDB()` → `tmcore.GetPGContext(ctx, module)` / `tmcore.GetPGContext(ctx)` with fallback to `r.connection`
`internal/adapters/mongodb/*/.mongodb.go`	MongoDB	Static mongo connection → `tmcore.GetMBContext(ctx, module)` / `tmcore.GetMBContext(ctx)` with fallback to `r.connection`
`internal/adapters/redis/*/.redis.go`	Redis	Every key operation → `valkey.GetKeyContext(ctx, key)` (including Lua script `KEYS[]` and `ARGV[]`)
`internal/adapters/storage/*/.go` (or S3 adapter)	S3	Every object key → `s3.GetS3KeyStorageContext(ctx, key)`

Conditional — services with targetServices (Gate 5.5):

File	Condition	What Changes
`internal/adapters/product/client.go` (or equivalent target service API client)	Service with targetServices declared	M2M authenticator with per-tenant credential caching via `secretsmanager.GetM2MCredentials`
`internal/bootstrap/service.go`	Service with targetServices	Conditional M2M wiring: `if cfg.MultiTenantEnabled` → AWS Secrets Manager client + M2M provider

Conditional — RabbitMQ only (Gate 6):

File Pattern	What Changes
`internal/adapters/rabbitmq/producer*.go`	Dual constructor: single-tenant (direct connection) + multi-tenant (`tmrabbitmq.Manager.GetChannel`). `X-Tenant-ID` header injection
`internal/adapters/rabbitmq/consumer*.go` (or `internal/bootstrap/`)	`tmconsumer.MultiTenantConsumer` with on-demand initialization. `X-Tenant-ID` header extraction

Tests (Gate 7-8):

File Pattern	What Tests
`internal/bootstrap/*_test.go`	`TestMultiTenant_BackwardCompatibility` — validates single-tenant mode works unchanged
`internal/adapters/*/_test.go`	Unit tests with mock tenant context, tenant isolation tests (two tenants, data separation)
`internal/service/*_test.go` (or integration test dir)	Integration tests with two distinct tenants verifying cross-tenant isolation

Output artifacts (Gate 11):

File	What
`docs/multi-tenant-guide.md`	Activation guide: env vars, how to enable/disable, verification steps
`docs/multi-tenant-preview.html`	Visual implementation preview (generated at Gate 1.5, kept for reference)

HARD GATE: Files outside this map that contain multi-tenant logic are non-compliant. If a service has custom files like internal/tenant/resolver.go, internal/middleware/tenant_middleware.go, pkg/multitenancy/pool.go or similar — these MUST be removed and replaced with the canonical lib-commons v4 tenant-manager sub-packages wired through the files listed above.

Required lib-commons Version

Multi-tenant support requires lib-commons v4 (github.com/LerianStudio/lib-commons/v4). The tenant-manager package does not exist in v2.

lib-commons version	Multi-tenant support	Package path
v2 (`lib-commons/v2`)	Not available	N/A — no `tenant-manager` package
v3 (`lib-commons/v3`)	Legacy	Same sub-packages as v4 but without `tenant-manager/cache`. Upgrade to v4.
v4 (`lib-commons/v4`)	Full support (current)	`github.com/LerianStudio/lib-commons/v4/commons/tenant-manager/...` (sub-packages: `core`, `client`, `cache`, `postgres`, `mongo`, `middleware`, `rabbitmq`, `consumer`, `valkey`, `s3`). The `middleware` sub-package contains `TenantMiddleware` with `WithPG`/`WithMB` variadic options that handle both single-module and multi-module services. Route-level composition uses a local `WhenEnabled` helper (not from lib-commons).

Migration to v4:

Services on lib-commons v2 or v3 MUST upgrade to v4 before implementing multi-tenant. The upgrade involves:

Update go.mod to the latest v4 tag
Update all import paths to v4
Add the tenant-manager package imports where needed

# Check latest v4 tag
git ls-remote --tags https://github.com/LerianStudio/lib-commons.git | grep "v4" | sort -V | tail -1

# Update go.mod
go get github.com/LerianStudio/lib-commons/v4@v4.5.0

# Update import paths across the codebase (portable — works on macOS and Linux)
# From v2:
find . -name "*.go" -exec perl -pi -e 's|lib-commons/v2|lib-commons/v4|g' {} +
# From v3:
find . -name "*.go" -exec perl -pi -e 's|lib-commons/v3|lib-commons/v4|g' {} +

# Verify build
go build ./...

When to Use Multi-Tenant Mode

Scenario	Mode	Configuration
Single customer deployment	Single-tenant	`MULTI_TENANT_ENABLED=false` (default)
SaaS with shared infrastructure	Multi-tenant	`MULTI_TENANT_ENABLED=true`
Multiple isolated databases per customer	Multi-tenant	Requires Tenant Manager

Environment Variables

Env Var	Description	Default	Required
`APPLICATION_NAME`	Service name for Tenant Manager API (`/tenants/{id}/services/{service}/settings`)	-	Yes
`MULTI_TENANT_ENABLED`	Enable multi-tenant mode	`false`	Yes
`MULTI_TENANT_URL`	Tenant Manager service URL	-	If multi-tenant
`MULTI_TENANT_REDIS_HOST`	Redis host for Pub/Sub event-driven tenant discovery (same Redis as tenant-manager)	-	If multi-tenant
`MULTI_TENANT_REDIS_PORT`	Redis port for Pub/Sub	`6379`	If multi-tenant
`MULTI_TENANT_REDIS_PASSWORD`	Redis password for Pub/Sub	-	If multi-tenant
`MULTI_TENANT_REDIS_TLS`	Enable TLS for Pub/Sub Redis connection (AWS ElastiCache/Valkey)	`false`	If multi-tenant
`MULTI_TENANT_MAX_TENANT_POOLS`	Soft limit for tenant connection pools (LRU eviction)	`100`	No
`MULTI_TENANT_IDLE_TIMEOUT_SEC`	Seconds before idle tenant connection is eviction-eligible	`300`	No
`MULTI_TENANT_TIMEOUT`	HTTP client timeout for tenant-manager API calls (seconds). Passed to `tmclient.WithTimeout`.	`30`	No
`MULTI_TENANT_CIRCUIT_BREAKER_THRESHOLD`	Consecutive failures before circuit breaker opens	`5`	Yes
`MULTI_TENANT_CIRCUIT_BREAKER_TIMEOUT_SEC`	Seconds before circuit breaker resets (half-open)	`30`	Yes
`MULTI_TENANT_SERVICE_API_KEY`	API key for authenticating with tenant-manager `/settings` endpoint. Generated via service catalog.	-	If multi-tenant
`MULTI_TENANT_CACHE_TTL_SEC`	In-memory cache TTL for tenant config. Passed to the tenant client.	`120`	Yes
`RABBITMQ_TLS`	Enable TLS/AMQPS connections for per-tenant RabbitMQ vhosts (AWS AmazonMQ, CloudAMQP)	`false`	No

Example .env for multi-tenant:

MULTI_TENANT_ENABLED=true
MULTI_TENANT_URL=http://tenant-manager:4003
MULTI_TENANT_REDIS_HOST=redis
MULTI_TENANT_REDIS_PORT=6379
MULTI_TENANT_REDIS_PASSWORD=
MULTI_TENANT_REDIS_TLS=false
MULTI_TENANT_MAX_TENANT_POOLS=100
MULTI_TENANT_IDLE_TIMEOUT_SEC=300
MULTI_TENANT_TIMEOUT=30
MULTI_TENANT_CIRCUIT_BREAKER_THRESHOLD=5
MULTI_TENANT_CIRCUIT_BREAKER_TIMEOUT_SEC=30
MULTI_TENANT_SERVICE_API_KEY=your-service-api-key-here
MULTI_TENANT_CACHE_TTL_SEC=120
RABBITMQ_TLS=false

Configuration

// internal/bootstrap/config.go
type Config struct {
    ApplicationName string `env:"APPLICATION_NAME"`

    // Multi-Tenant Configuration
    MultiTenantEnabled                  bool   `env:"MULTI_TENANT_ENABLED" default:"false"`
    MultiTenantURL                      string `env:"MULTI_TENANT_URL"`
    MultiTenantRedisHost                string `env:"MULTI_TENANT_REDIS_HOST"`
    MultiTenantRedisPort                string `env:"MULTI_TENANT_REDIS_PORT" default:"6379"`
    MultiTenantRedisPassword            string `env:"MULTI_TENANT_REDIS_PASSWORD"`
    MultiTenantRedisTLS                 bool   `env:"MULTI_TENANT_REDIS_TLS"`
    MultiTenantMaxTenantPools           int    `env:"MULTI_TENANT_MAX_TENANT_POOLS" default:"100"`
    MultiTenantIdleTimeoutSec           int    `env:"MULTI_TENANT_IDLE_TIMEOUT_SEC" default:"300"`
    MultiTenantTimeout                  int    `env:"MULTI_TENANT_TIMEOUT" default:"30"`
    MultiTenantCircuitBreakerThreshold  int    `env:"MULTI_TENANT_CIRCUIT_BREAKER_THRESHOLD" default:"5"`
    MultiTenantCircuitBreakerTimeoutSec int    `env:"MULTI_TENANT_CIRCUIT_BREAKER_TIMEOUT_SEC" default:"30"`
    MultiTenantServiceAPIKey            string `env:"MULTI_TENANT_SERVICE_API_KEY"`
    MultiTenantCacheTTLSec              int    `env:"MULTI_TENANT_CACHE_TTL_SEC" default:"120"`

    // RabbitMQ TLS (for managed services like AWS AmazonMQ, CloudAMQP)
    RabbitMQTLS                            bool   `env:"RABBITMQ_TLS" default:"false"`

    // PostgreSQL Primary (used as default connection in single-tenant mode)
    PrimaryHost     string `env:"POSTGRES_HOST"`
    PrimaryUser     string `env:"POSTGRES_USER"`
    PrimaryPassword string `env:"POSTGRES_PASSWORD"`
    PrimaryName     string `env:"POSTGRES_NAME"`
    PrimaryPort     string `env:"POSTGRES_PORT"`
    PrimarySSLMode  string `env:"POSTGRES_SSLMODE"`
}

Service Name Resolution

The service parameter in NewManager maps to the Tenant Manager API path: /tenants/{id}/services/{service}/settings. Use cfg.ApplicationName (env APPLICATION_NAME):

pgMgr := tmpostgres.NewManager(tmClient, cfg.ApplicationName,
    tmpostgres.WithModule("onboarding"),  // module = component name constant
    tmpostgres.WithLogger(logger),
)

Parameter	Source	Purpose	Example
`service` (2nd arg)	`cfg.ApplicationName` (env `APPLICATION_NAME`)	Tenant Manager API path	`"ledger"`, `"reporter"`
`module` (WithModule)	Component constant	Key in `TenantConfig.Databases[module]`	`"onboarding"`, `"transaction"`, `"manager"`

Manager Wiring

TenantMiddleware (single-module): Managers are passed via WithPG/WithMB options:

mongoManager := tmmongo.NewManager(tmClient, cfg.ApplicationName, ...)
ttMid := tmmiddleware.NewTenantMiddleware(
    tmmiddleware.WithMB(mongoManager),
)

TenantMiddleware (multi-module): Use named WithPG/WithMB options for each module:

middleware := tmmiddleware.NewTenantMiddleware(
    tmmiddleware.WithPG(onboardingPGManager, "onboarding"),
    tmmiddleware.WithPG(transactionPGManager, "transaction"),
    tmmiddleware.WithMB(onboardingMongoManager, "onboarding"),
    tmmiddleware.WithMB(transactionMongoManager, "transaction"),
    tmmiddleware.WithTenantCache(tenantCache),
    tmmiddleware.WithTenantLoader(tenantLoader),
)

Tenant Manager Service API

The Tenant Manager is an external service that stores database credentials per tenant. All connection managers in lib-commons call this API to resolve tenant-specific connections.

Endpoints:

Method	Path	Returns	Purpose
`GET`	`/tenants/{tenantID}/services/{service}/settings`	`TenantConfig`	Full tenant configuration with DB credentials
`GET`	`/tenants/active?service={service}`	`[]*TenantSummary`	List of active tenants (fallback for discovery)

Tenant Discovery (for consumer mode):

Primary: Redis SMEMBERS "tenant-manager:tenants:active" (fast, <1ms)
Fallback: HTTP GET /tenants/active?service={service} (slower, network call)

TenantConfig Data Model

The Tenant Manager returns this structure for each tenant. The Databases map is keyed by module name (e.g., "onboarding", "transaction").

type TenantConfig struct {
    ID            string                         // Tenant UUID
    TenantSlug    string                         // Human-readable slug
    IsolationMode string                         // "isolated" (default) or "schema"
    Databases     map[string]DatabaseConfig       // module -> config
    Messaging     *MessagingConfig               // RabbitMQ config (optional)
}

type DatabaseConfig struct {
    PostgreSQL         *PostgreSQLConfig
    PostgreSQLReplica  *PostgreSQLConfig    // Read replica (optional)
    MongoDB            *MongoDBConfig
    ConnectionSettings *ConnectionSettings  // Per-tenant pool overrides (optional)
}

// ConnectionSettings holds per-tenant database connection pool settings.
// When present, these values override the global defaults on PostgresManager/MongoManager.
// If nil (e.g., older tenant associations), global defaults apply.
type ConnectionSettings struct {
    MaxOpenConns int `json:"maxOpenConns"`
    MaxIdleConns int `json:"maxIdleConns"`
}

Isolation Modes:

Mode	Database	Schema	Connection String	When to Use
`isolated` (default)	Separate database per tenant	Default `public` schema	Standard connection	Strong isolation, recommended
`schema`	Shared database	Schema per tenant	Adds `options=-csearch_path="{schema}"`	Cost optimization, weaker isolation

Connection Pool Management

All connection managers (PostgreSQL, MongoDB, RabbitMQ) use LRU eviction with soft limits:

Soft limit (WithMaxTenantPools): When the pool reaches this size and a new tenant needs a connection, only connections idle longer than the timeout are evicted. If all connections are active, the pool grows beyond the limit.
Idle timeout (WithIdleTimeout): Connections not accessed within this window become eligible for eviction. Default: 5 minutes.
Connection health: Cached connections are pinged before reuse (3s timeout). Stale connections are recreated transparently.

The Tenant Manager HTTP client MUST enable the circuit breaker (WithCircuitBreaker):

After N consecutive failures, the circuit opens and requests fail fast with ErrCircuitBreakerOpen
After the timeout, the circuit enters half-open state and allows one request through
On success, the circuit resets to closed

JWT Tenant Extraction

Claim key: tenantId (camelCase, hardcoded)

<cannot_skip>

⛔ CRITICAL: tenantId from JWT is the ONLY multi-tenant mechanism.

The tenantId identifies the client/customer. The lib-commons TenantMiddleware extracts it from the JWT, resolves the tenant-specific database connection via Tenant Manager API, and stores it in context. Each tenant has its own database — tenant A cannot query tenant B's database.

organization_id is NOT part of multi-tenant isolation. It is a separate concern (entity within a domain). Adding organization_id filters to queries does NOT provide tenant isolation. Multi-tenant isolation comes exclusively from tenantId → TenantConnectionManager → database-per-tenant.

Anti-Rationalization:

Rationalization	Why It's WRONG	Required Action
"Adding organization_id filters = multi-tenant"	organization_id does NOT route to different databases. All data still in ONE database.	MUST implement tenantId → TenantConnectionManager
"The codebase already has organization_id wiring"	organization_id is irrelevant for multi-tenant. tenantId from JWT is the mechanism.	Implement TenantMiddleware with JWT tenantId extraction
"Midaz uses organization_id for tenant isolation"	WRONG. Midaz has ZERO organization_id in WHERE clauses. It uses tenantId → tmcore.GetPGContext(ctx, module).	Follow the actual pattern: tenantId → context → database routing

</cannot_skip>

// internal/bootstrap/middleware.go
func extractTenantIDFromToken(c *fiber.Ctx) (string, error) {
    // Use lib-commons helper for token extraction
    accessToken := libHTTP.ExtractTokenFromHeader(c)
    if accessToken == "" {
        return "", errors.New("no authorization token provided")
    }

    // Parse without validation (validation done by auth middleware)
    token, _, err := new(jwt.Parser).ParseUnverified(accessToken, jwt.MapClaims{})
    if err != nil {
        return "", err
    }

    claims, ok := token.Claims.(jwt.MapClaims)
    if !ok {
        return "", errors.New("invalid token claims format")
    }

    // Extract tenantId (camelCase only - no fallbacks)
    tenantID, ok := claims["tenantId"].(string)
    if !ok || tenantID == "" {
        return "", errors.New("tenantId claim not found in token")
    }

    return tenantID, nil
}

Generic TenantMiddleware (Standard Pattern)

This is the standard pattern for all services. The lib-commons TenantMiddleware handles JWT extraction, tenant resolution, and context injection automatically.

// internal/bootstrap/config.go
import (
    "github.com/LerianStudio/lib-commons/v4/commons/tenant-manager/client"
    tmpostgres "github.com/LerianStudio/lib-commons/v4/commons/tenant-manager/postgres"
    tmmongo "github.com/LerianStudio/lib-commons/v4/commons/tenant-manager/mongo"
    "github.com/LerianStudio/lib-commons/v4/commons/tenant-manager/middleware"
)

func initService(cfg *Config) {
    // 1. Create Tenant Manager HTTP client (with circuit breaker — MANDATORY)
    var clientOpts []client.ClientOption
    if cfg.MultiTenantCircuitBreakerThreshold > 0 {
        clientOpts = append(clientOpts,
            client.WithCircuitBreaker(
                cfg.MultiTenantCircuitBreakerThreshold,
                time.Duration(cfg.MultiTenantCircuitBreakerTimeoutSec)*time.Second,
            ),
        )
    }
    if cfg.MultiTenantTimeout > 0 {
        clientOpts = append(clientOpts,
            client.WithTimeout(time.Duration(cfg.MultiTenantTimeout)*time.Second),
        )
    }
    clientOpts = append(clientOpts,
        client.WithServiceAPIKey(cfg.MultiTenantServiceAPIKey),
    )
    tmClient, err := client.NewClient(cfg.MultiTenantURL, logger, clientOpts...)
    if err != nil {
        return fmt.Errorf("creating tenant-manager client: %w", err)
    }

    idleTimeout := time.Duration(cfg.MultiTenantIdleTimeoutSec) * time.Second

    // 2. Create PostgreSQL manager (one per service or per module)
    pgManager := tmpostgres.NewManager(tmClient, "my-service",
        tmpostgres.WithModule("my-module"),
        tmpostgres.WithLogger(logger),
        tmpostgres.WithMaxTenantPools(cfg.MultiTenantMaxTenantPools),
        tmpostgres.WithIdleTimeout(idleTimeout),
    )

    // 3. Create MongoDB manager (optional)
    mongoManager := tmmongo.NewManager(tmClient, "my-service",
        tmmongo.WithModule("my-module"),
        tmmongo.WithLogger(logger),
        tmmongo.WithMaxTenantPools(cfg.MultiTenantMaxTenantPools),
        tmmongo.WithIdleTimeout(idleTimeout),
    )

    // 4. Create middleware (do NOT register globally — use per-route with WhenEnabled)
    ttMid := middleware.NewTenantMiddleware(
        middleware.WithPG(pgManager),
        middleware.WithMB(mongoManager),  // optional
        middleware.WithTenantCache(tenantCache),
        middleware.WithTenantLoader(tenantLoader),
    )
    // Pass ttMid.WithTenantDB as the ttHandler to routes.go.
    // In routes.go, register per-route using WhenEnabled(ttHandler).
    // When MULTI_TENANT_ENABLED=false, pass nil instead — WhenEnabled handles it.
    // See "Route-Level Auth-Before-Tenant Ordering" section
}

What the middleware does internally:

Extracts Authorization: Bearer {token} header
Parses JWT (unverified — auth middleware already validated it)
Extracts tenantId claim
Calls PostgresManager.GetConnection(ctx, tenantID) to resolve tenant-specific DB
Stores tenant ID and DB connection in context
If MongoDB manager is set, resolves and stores MongoDB connection
Calls c.Next()

In repositories, use context-based getters:

import (
    tmcore "github.com/LerianStudio/lib-commons/v4/commons/tenant-manager/core"
    "github.com/LerianStudio/lib-commons/v4/commons/tenant-manager/valkey"
)

// Single-module service: use generic getter
db := tmcore.GetPGContext(ctx)

// Multi-module service: use module-specific getter
db := tmcore.GetPGContext(ctx, constant.ModuleOnboarding)

// MongoDB (single-module)
mongoDB := tmcore.GetMBContext(ctx)

// MongoDB (multi-module)
mongoDB := tmcore.GetMBContext(ctx, constant.ModuleOnboarding)

// Redis key prefixing
key := valkey.GetKeyContext(ctx, "cache-key")
// -> "tenant:{tenantId}:cache-key"

// Get tenant ID directly
tenantID := tmcore.GetTenantIDContext(ctx)

Multi-module middleware (TenantMiddleware with multi-module)

When to use: Services that serve multiple modules on a single port with different databases per module. For example, midaz ledger serves onboarding and transaction modules in a single process, each with its own PostgreSQL and MongoDB pools.

Most services do NOT need multi-module. If your service has a single database (CRM, plugin-auth, reporter, etc.), use the standard TenantMiddleware above with unnamed WithPG/WithMB. Only use named module options when you have multiple database pools per type.

Import:

import (
    tmmiddleware "github.com/LerianStudio/lib-commons/v4/commons/tenant-manager/middleware"
)

Key types:

Type	Purpose
`TenantMiddleware`	Unified middleware that handles both single-module and multi-module services via `WithPG`/`WithMB` variadic options
`TenantMiddlewareOption`	Functional option for configuring `TenantMiddleware`

Available options:

Option	Purpose	Required
`WithPG(manager)`	Register a PostgreSQL manager (single-module, no name)	At least one PG or MB
`WithPG(manager, "module")`	Register a named PostgreSQL manager (multi-module)	At least one PG or MB
`WithMB(manager)`	Register a MongoDB manager (single-module, no name)	No
`WithMB(manager, "module")`	Register a named MongoDB manager (multi-module)	No
`WithTenantCache(cache)`	In-memory tenant cache (12h TTL, shared with EventListener)	Yes (event-driven)
`WithTenantLoader(loader)`	Lazy-load tenant config from tenant-manager API on cache miss	Yes (event-driven)

Multi-module service example

// config.go - Multi-module service (e.g., unified ledger with onboarding + transaction)
import (
    tmmiddleware "github.com/LerianStudio/lib-commons/v4/commons/tenant-manager/middleware"
)

middleware := tmmiddleware.NewTenantMiddleware(
    tmmiddleware.WithPG(onboardingPGManager, "onboarding"),
    tmmiddleware.WithPG(transactionPGManager, "transaction"),
    tmmiddleware.WithMB(onboardingMongoManager, "onboarding"),
    tmmiddleware.WithMB(transactionMongoManager, "transaction"),
    tmmiddleware.WithTenantCache(tenantCache),
    tmmiddleware.WithTenantLoader(tenantLoader),
)

// Pass middleware.WithTenantDB to routes.go — register per-route using WhenEnabled:
// See "Route-Level Auth-Before-Tenant Ordering" section

What the middleware does internally:

Extracts Authorization: Bearer {token} header and parses JWT
Extracts tenantId claim from JWT
Checks TenantCache for tenant config — if miss, calls TenantLoader to lazy-load from tenant-manager API
For each registered PG manager, resolves connection via manager.GetConnection(ctx, tenantID) and stores it using module-scoped context keys (via ContextWithPG)
For each registered MB manager, resolves MongoDB connection and stores it (via ContextWithMB)
Calls c.Next()

In repositories for multi-module services, use module-scoped getters:

import tmcore "github.com/LerianStudio/lib-commons/v4/commons/tenant-manager/core"

// Multi-module: use module-specific getter
db := tmcore.GetPGContext(ctx, constant.ModuleTransaction)
db := tmcore.GetPGContext(ctx, constant.ModuleOnboarding)

Simple single-module service example

// config.go - Single-module service (e.g., CRM, plugin, reporter)
// Use TenantMiddleware with unnamed WithPG/WithMB
import (
    tmmiddleware "github.com/LerianStudio/lib-commons/v4/commons/tenant-manager/middleware"
)

ttMid := tmmiddleware.NewTenantMiddleware(
    tmmiddleware.WithPG(pgManager),
    tmmiddleware.WithMB(mongoManager),  // optional
    tmmiddleware.WithTenantCache(tenantCache),
    tmmiddleware.WithTenantLoader(tenantLoader),
)
// Pass ttMid.WithTenantDB to routes.go — register per-route using WhenEnabled:
// See "Route-Level Auth-Before-Tenant Ordering" section

Single-module vs Multi-module

Feature	Single-module	Multi-module
PG option	`WithPG(manager)`	`WithPG(manager, "module")`
MB option	`WithMB(manager)`	`WithMB(manager, "module")`
Context getter (PG)	`tmcore.GetPGContext(ctx)`	`tmcore.GetPGContext(ctx, module)`
Context getter (MB)	`tmcore.GetMBContext(ctx)`	`tmcore.GetMBContext(ctx, module)`
When to use	Single-module services	Multi-module unified services

Rule of thumb: If your service has one database module, use WithPG(manager) / WithMB(manager) (unnamed). If your service combines multiple modules with different databases behind one HTTP port, use WithPG(manager, "module") / WithMB(manager, "module") (named).

Multi-Tenant Error Responses

Each service implements its own error mapper using lib-commons error types:

Scenario	HTTP Status	Error Type
Tenant suspended/purged	403	`*tmcore.TenantSuspendedError`
Tenant not found	404	`tmcore.ErrTenantNotFound`
DB schema not initialized	422	`tmcore.IsTenantNotProvisionedError(err)`
Tenant-manager unavailable	503	Network/connection error

Database Connection in Repositories

Repositories use a getDB helper that checks context for tenant connections with fallback to the static connection:

// internal/adapters/postgres/entity/entity.postgresql.go

// getDB resolves the database connection from tenant context with fallback to static connection.
func (r *EntityPostgreSQLRepository) getDB(ctx context.Context) (dbresolver.DB, error) {
    // Multi-module: check module-specific context first
    if db := tmcore.GetPGContext(ctx, constant.ModuleOnboarding); db != nil {
        return db, nil
    }
    // Single-module: check generic context
    if db := tmcore.GetPGContext(ctx); db != nil {
        return db, nil
    }
    // Fallback to static connection (single-tenant mode)
    if r.requireTenant {
        return nil, fmt.Errorf("tenant postgres connection missing from context")
    }
    if r.connection == nil {
        return nil, fmt.Errorf("postgres connection not available")
    }
    return r.connection.Resolver(ctx)
}

func (r *EntityPostgreSQLRepository) Create(ctx context.Context, entity *mmodel.Entity) (*mmodel.Entity, error) {
    logger, tracer, _, _ := libCommons.NewTrackingFromContext(ctx)

    ctx, span := tracer.Start(ctx, "postgres.create_entity")
    defer span.End()

    // Get tenant-specific connection from context
    db, err := r.getDB(ctx)
    if err != nil {
        libOpentelemetry.HandleSpanError(&span, "Failed to get database connection", err)
        logger.Errorf("Failed to get database connection: %v", err)
        return nil, err
    }

    record := &EntityPostgreSQLModel{}
    record.FromEntity(entity)

    // Use db for queries - automatically scoped to tenant's database
    // ...
}

Redis Key Prefixing

// internal/adapters/redis/repository.go
func (r *RedisRepository) Set(ctx context.Context, key, value string, ttl time.Duration) error {
    logger, tracer, _, _ := libCommons.NewTrackingFromContext(ctx)

    ctx, span := tracer.Start(ctx, "redis.set")
    defer span.End()

    // Tenant-aware key prefixing (adds tenant:{tenantId}: prefix if multi-tenant)
    key = valkey.GetKeyContext(ctx, key)

    rds, err := r.conn.GetConnection(ctx)
    if err != nil {
        return err
    }

    return rds.Set(ctx, key, value, ttl).Err()
}

S3/Object Storage Key Prefixing

Services that store files in S3 MUST prefix object keys with the tenant ID for tenant isolation. The bucket is configured per service via environment variable. Tenant separation is by directory within the bucket.

// In any service/adapter that uploads, downloads, or deletes files from S3:
func (r *StorageRepository) Upload(ctx context.Context, originalKey, contentType string, data io.Reader) error {
    // Tenant-aware key prefixing: {tenantId}/{originalKey} in multi-tenant, {originalKey} in single-tenant
    key := s3.GetS3KeyStorageContext(ctx, originalKey)

    return r.s3Client.Upload(ctx, key, data, contentType)
}

func (r *StorageRepository) Download(ctx context.Context, originalKey string) (io.ReadCloser, error) {
    // MUST use the same prefixed key for reads and writes
    key := s3.GetS3KeyStorageContext(ctx, originalKey)

    return r.s3Client.Download(ctx, key)
}

Storage structure:

Bucket: {service-name}  (env var: OBJECT_STORAGE_BUCKET)
  └── {tenantId}/
       └── {resource}/{path}

Backward compatibility: When no tenant is in context (single-tenant mode), the key is returned unchanged — no prefix added.

RabbitMQ Multi-Tenant: Two-Layer Isolation Model

RabbitMQ multi-tenant requires two complementary layers — both are mandatory:

Layer	Mechanism	Purpose
1. Vhost Isolation	`tmrabbitmq.Manager` → `GetChannel(ctx, tenantID)`	Isolation. Each tenant gets its own RabbitMQ vhost. Queues, exchanges, and connections are fully separated.
2. X-Tenant-ID Header	`headers["X-Tenant-ID"] = tenantID`	Audit + context propagation. Enables distributed tracing, log correlation, and downstream tenant resolution. Does NOT provide isolation.

⛔ Layer 2 alone is NOT multi-tenant compliant. A shared connection with X-Tenant-ID headers provides traceability but zero isolation — a poison message or traffic spike from one tenant affects all tenants.

⛔ Layer 1 alone is incomplete. Vhosts isolate but the X-Tenant-ID header is needed for log correlation, distributed tracing, and downstream context propagation across services.

RabbitMQ Manager Initialization

The tmrabbitmq.Manager manages per-tenant vhost connections. Initialize it in the bootstrap with functional options:

import tmrabbitmq "github.com/LerianStudio/lib-commons/v4/commons/tenant-manager/rabbitmq"

rmqOpts := []tmrabbitmq.Option{
    tmrabbitmq.WithModule(constant.ModuleManager),
    tmrabbitmq.WithLogger(logger),
    tmrabbitmq.WithMaxTenantPools(cfg.MultiTenantMaxTenantPools),
    tmrabbitmq.WithIdleTimeout(time.Duration(cfg.MultiTenantIdleTimeoutSec) * time.Second),
}

// TLS for production environments (AWS AmazonMQ, CloudAMQP, etc.)
if cfg.RabbitMQTLS {
    rmqOpts = append(rmqOpts, tmrabbitmq.WithTLS())
}

rabbitMQManager := tmrabbitmq.NewManager(
    tmClient,
    cfg.ApplicationName,
    rmqOpts...,
)

Available options:

Option	Purpose	Required
`WithModule(name)`	Module name for the manager	Yes
`WithLogger(logger)`	Logger instance	Yes
`WithMaxTenantPools(n)`	LRU soft limit for vhost connections	No (default: 100)
`WithIdleTimeout(d)`	Idle timeout before connection eviction	No (default: 5min)
`WithTLS()`	Enable TLS for AMQPS connections (required for AWS AmazonMQ, CloudAMQP, and production environments)	No (default: false)

TLS configuration: The RABBITMQ_TLS env var controls whether the manager uses AMQPS (TLS) connections to per-tenant vhosts. In production with managed RabbitMQ services (AWS AmazonMQ, CloudAMQP), TLS is typically required. The WithTLS() option configures the manager to use amqps:// scheme and TLS dial for all tenant connections.

RabbitMQ Multi-Tenant Producer

// internal/adapters/rabbitmq/producer.go
type ProducerRepository struct {
    conn            *libRabbitmq.RabbitMQConnection
    rabbitMQManager *tmrabbitmq.Manager
    multiTenantMode bool
}

// Single-tenant constructor
func NewProducer(conn *libRabbitmq.RabbitMQConnection) *ProducerRepository {
    return &ProducerRepository{
        conn:            conn,
        multiTenantMode: false,
    }
}

// Multi-tenant constructor
func NewProducerMultiTenant(pool *tmrabbitmq.Manager) *ProducerRepository {
    return &ProducerRepository{
        rabbitMQManager: pool,
        multiTenantMode: true,
    }
}

func (p *ProducerRepository) Publish(ctx context.Context, exchange, key string, message []byte) error {
    // Inject tenant ID header
    tenantID := tmcore.GetTenantIDContext(ctx)
    headers := amqp.Table{}
    if tenantID != "" {
        headers["X-Tenant-ID"] = tenantID
    }

    if p.multiTenantMode {
        if tenantID == "" {
            return fmt.Errorf("tenant ID is required in multi-tenant mode")
        }

        // Get tenant-specific channel from pool
        channel, err := p.rabbitMQManager.GetChannel(ctx, tenantID)
        if err != nil {
            return err
        }

        return channel.PublishWithContext(ctx, exchange, key, false, false,
            amqp.Publishing{
                ContentType:  "application/json",
                DeliveryMode: amqp.Persistent,
                Headers:      headers,
                Body:         message,
            })
    }

    // Single-tenant: use static connection
    return p.conn.Channel.Publish(exchange, key, false, false,
        amqp.Publishing{Body: message, Headers: headers})
}

MongoDB Multi-Tenant Repository

// internal/adapters/mongodb/metadata.go
type MetadataMongoDBRepository struct {
    connection *libMongo.MongoConnection
    dbName     string
}

func NewMetadataMongoDBRepository(conn *libMongo.MongoConnection, dbName string) *MetadataMongoDBRepository {
    return &MetadataMongoDBRepository{connection: conn, dbName: dbName}
}

// getMongoDB resolves the MongoDB database from tenant context with fallback to static connection.
func (r *MetadataMongoDBRepository) getMongoDB(ctx context.Context) (*mongo.Database, error) {
    // Multi-module: check module-specific context first
    if db := tmcore.GetMBContext(ctx, constant.ModuleOnboarding); db != nil {
        return db, nil
    }
    // Single-module: check generic context
    if db := tmcore.GetMBContext(ctx); db != nil {
        return db, nil
    }
    // Fallback to static connection (single-tenant mode)
    if r.connection == nil {
        return nil, fmt.Errorf("mongo connection not available")
    }
    return r.connection.GetDB(ctx), nil
}

func (r *MetadataMongoDBRepository) Create(ctx context.Context, collection string, metadata *Metadata) error {
    logger, tracer, _, _ := libCommons.NewTrackingFromContext(ctx)

    ctx, span := tracer.Start(ctx, "mongodb.create_metadata")
    defer span.End()

    // Get tenant-specific database from context
    tenantDB, err := r.getMongoDB(ctx)
    if err != nil {
        libOpentelemetry.HandleSpanError(&span, "Failed to get database connection", err)
        return err
    }

    // Use tenant's database for operations
    coll := tenantDB.Collection(strings.ToLower(collection))

    record := &MetadataMongoDBModel{}
    if err := record.FromEntity(metadata); err != nil {
        return err
    }

    _, err = coll.InsertOne(ctx, record)
    if err != nil {
        libOpentelemetry.HandleSpanError(&span, "Failed to insert metadata", err)
        return err
    }

    return nil
}

Event-Driven Tenant Discovery

Replaces polling-based discovery. Consumer services boot with an empty tenant map and discover tenants via:

Redis Pub/Sub events from tenant-manager (12 lifecycle events)
Lazy-load on first HTTP request via GET /tenants/{orgId}/associations/{service}/connections

Components:

Component	Purpose
`EventListener`	Standalone Redis Pub/Sub subscriber (`PSUBSCRIBE tenant-events:*`)
`TenantCache`	Shared in-memory cache (12h TTL)
`TenantLoader`	Lazy-load from tenant-manager API on cache miss
`EventDispatcher`	Routes events to handlers (evict, reload, start consumer)

Redis client for Pub/Sub (MANDATORY: use NewTenantPubSubRedisClient):

import tmredis "github.com/LerianStudio/lib-commons/v4/commons/tenant-manager/redis"

tmRedisClient, err := tmredis.NewTenantPubSubRedisClient(ctx, tmredis.TenantPubSubRedisConfig{
    Host:     cfg.MultiTenantRedisHost,
    Port:     cfg.MultiTenantRedisPort,
    Password: cfg.MultiTenantRedisPassword,
    TLS:      cfg.MultiTenantRedisTLS,
})

Do NOT build redis.UniversalClient manually — use the centralized helper above. NON-COMPLIANT if manual libRedis.Config setup is used instead.

Event listener initialization (MANDATORY: use NewTenantEventListener):

import tmevent "github.com/LerianStudio/lib-commons/v4/commons/tenant-manager/event"

eventListener, err := tmevent.NewTenantEventListener(
    tmRedisClient,
    dispatcher.HandleEvent,
    tmevent.WithListenerLogger(logger),
    tmevent.WithService(tenantServiceName),
)

NON-COMPLIANT if NewTenantEventListener is not wired with the NewTenantPubSubRedisClient output. Both are required for event-driven tenant discovery.

Callbacks (ordering is MANDATORY):

WithOnTenantAdded — (1) invalidates pmClient cache first (prevent stale config), then (2) starts RabbitMQ consumer goroutine
WithOnTenantRemoved — (1) stops consumer goroutine first (prevent retry loops), then (2) closes ALL infrastructure connections, then (3) invalidates pmClient cache
TenantLoader.SetOnTenantLoaded — starts RabbitMQ consumer after lazy-load (covers service restart)

// OnTenantAdded: invalidate cache BEFORE starting consumer
tmevent.WithOnTenantAdded(func(ctx context.Context, tenantID string) {
    if tenantClient != nil {
        _ = tenantClient.InvalidateConfig(ctx, tenantID, tenantServiceName)
    }
    if consumer != nil {
        consumer.EnsureConsumerStarted(ctx, tenantID)
    }
})

// OnTenantRemoved: stop consumer FIRST, then close connections, then invalidate cache
tmevent.WithOnTenantRemoved(func(ctx context.Context, tenantID string) {
    if consumer != nil {
        consumer.StopConsumer(tenantID)
    }
    _ = onboardingPGManager.CloseConnection(ctx, tenantID)
    _ = transactionPGManager.CloseConnection(ctx, tenantID)
    _ = onboardingMongoManager.CloseConnection(ctx, tenantID)
    _ = transactionMongoManager.CloseConnection(ctx, tenantID)
    _ = rabbitmqManager.CloseConnection(ctx, tenantID)
    if tenantClient != nil {
        _ = tenantClient.InvalidateConfig(ctx, tenantID, tenantServiceName)
    }
})

// OnTenantLoaded: start consumer after lazy-load (covers service restart)
tenantLoader.SetOnTenantLoaded(func(ctx context.Context, tenantID string) {
    if consumer != nil {
        consumer.EnsureConsumerStarted(ctx, tenantID)
    }
})

Event types reference:

Event	When	Consumer Action
`tenant.created`	Tenant created	No-op (lazy-load on first request)
`tenant.updated`	Metadata changed	Refresh TTL
`tenant.suspended`	Tenant suspended	Evict cache, close pools
`tenant.activated`	Tenant reactivated	Refresh TTL
`tenant.deleted`	Tenant deleted	Evict cache, close pools
`tenant.service.associated`	Service provisioned	Add to cache, start consumer
`tenant.service.disassociated`	Service removed	Evict cache, close pools, stop consumer
`tenant.service.suspended`	Service suspended	Evict cache, close pools
`tenant.service.purged`	Credentials revoked	Evict cache, close pools
`tenant.service.reactivated`	Service reactivated	Re-add to cache with jitter
`tenant.credentials.rotated`	M2M credentials rotated	Close pools, eager reload via /connections
`tenant.connections.updated`	Pool settings changed	Apply new settings

Event envelope uses snake_case keys and WorkOS org ID as tenant_id.

RabbitMQ Consumer Goroutine Startup

RabbitMQ consumer goroutines start on-demand in two scenarios:

Scenario	Trigger	Callback
Pub/Sub event (`tenant.service.associated`)	`OnTenantAdded` on dispatcher	`tenantClient.InvalidateConfig` then `consumer.EnsureConsumerStarted`
HTTP request lazy-load (after restart)	`onTenantLoaded` on loader	`consumer.EnsureConsumerStarted`

EnsureConsumerStarted is idempotent — no-op if consumer already running.

After service restart, the first HTTP request for a tenant triggers lazy-load, the consumer starts, and pending messages in the queue are processed.

Consumer teardown happens in OnTenantRemoved:

Step	Action	Why
1	`consumer.StopConsumer(tenantID)`	Stop goroutine and prevent retry loops BEFORE closing connections
2	Close all infrastructure connections (PG, Mongo, RabbitMQ)	Release resources
3	`tenantClient.InvalidateConfig(ctx, tenantID, serviceName)`	Prevent stale 200 responses allowing evicted tenant to reconnect

onTenantLoaded wiring covers the restart scenario where no Pub/Sub event fires but the tenant is discovered via lazy-load:

tenantLoader.SetOnTenantLoaded(func(ctx context.Context, tenantID string) {
    if consumer != nil {
        consumer.EnsureConsumerStarted(ctx, tenantID)
    }
})

HTTP Client Transport

All HTTP clients that make concurrent requests (auth middleware, pmClient) must use custom http.Transport with ForceAttemptHTTP2: false to prevent Go stdlib HTTP/2 hpack panic under concurrent goroutine access.

lib-commons pmClient and lib-auth sharedHTTPClient already implement this. New HTTP clients must follow the same pattern:

&http.Client{
    Timeout: 30 * time.Second,
    Transport: &http.Transport{
        Proxy:                 http.ProxyFromEnvironment,
        ForceAttemptHTTP2:     false,
        MaxIdleConns:          100,
        MaxIdleConnsPerHost:   10,
        IdleConnTimeout:       90 * time.Second,
        TLSHandshakeTimeout:  10 * time.Second,
        ExpectContinueTimeout: 1 * time.Second,
    },
}

Conditional Initialization

The initialization path depends on whether the service runs a single module or combines multiple modules:

// internal/bootstrap/service.go
func InitService(cfg *Config) (*Service, error) {
    // ttHandler starts as nil — WhenEnabled(nil) is a no-op (single-tenant passthrough)
    var ttHandler fiber.Handler

    if cfg.MultiTenantEnabled && cfg.MultiTenantURL != "" {
        if isUnifiedService {
            // Multi-module: use named WithPG/WithMB options
            // See "Multi-module middleware (TenantMiddleware with multi-module)" section above
            ttMid := tmmiddleware.NewTenantMiddleware(
                tmmiddleware.WithPG(onboardingPGManager, "onboarding"),
                tmmiddleware.WithPG(transactionPGManager, "transaction"),
                tmmiddleware.WithMB(onboardingMongoManager, "onboarding"),
                tmmiddleware.WithMB(transactionMongoManager, "transaction"),
                tmmiddleware.WithTenantCache(tenantCache),
                tmmiddleware.WithTenantLoader(tenantLoader),
            )
            ttHandler = ttMid.WithTenantDB
        } else {
            // Single-module: use unnamed WithPG/WithMB options
            // See "Generic TenantMiddleware (Standard Pattern)" section above
            ttMid := tmmiddleware.NewTenantMiddleware(
                tmmiddleware.WithPG(pgManager),
                tmmiddleware.WithTenantCache(tenantCache),
                tmmiddleware.WithTenantLoader(tenantLoader),
            )
            ttHandler = ttMid.WithTenantDB
        }
        // Do NOT register globally with app.Use() — register per-route in routes.go
        // using WhenEnabled(ttHandler). See "Route-Level Auth-Before-Tenant Ordering" section.

        logger.Infof("Multi-tenant mode enabled with Tenant Manager URL: %s", cfg.MultiTenantURL)
    } else {
        logger.Info("Running in SINGLE-TENANT MODE")
        // ttHandler remains nil — WhenEnabled(nil) calls c.Next() immediately
    }

    // Pass ttHandler to NewRoutes — routes use WhenEnabled(ttHandler) per-route
    // ...
}

Most services follow the single-module path. Only unified services like midaz ledger need the multi-module path with named WithPG/WithMB options.

Testing Multi-Tenant Code

Unit Tests with Mock Tenant Context

// internal/service/user_service_test.go
func TestUserService_Create_WithTenantContext(t *testing.T) {
    // Setup tenant context
    tenantID := "tenant-123"
    ctx := core.ContextWithTenantID(context.Background(), tenantID)

    // Mock database connection
    mockDB := setupMockDB(t)
    ctx = tmcore.ContextWithPG(ctx, mockDB)

    // Create service with mock dependencies
    repo := repository.NewUserRepository()
    service := service.NewUserService(repo, logger)

    // Execute
    input := &CreateUserInput{Name: "John", Email: "john@example.com"}
    result, err := service.Create(ctx, input)

    // Assert
    require.NoError(t, err)
    assert.Equal(t, "John", result.Name)
}

Testing Tenant Isolation

func TestRepository_Create_TenantIsolation(t *testing.T) {
    tests := []struct {
        name     string
        tenantID string
        input    *Entity
        wantErr  bool
    }{
        {
            name:     "tenant-1 creates entity",
            tenantID: "tenant-1",
            input:    &Entity{Name: "Entity A"},
            wantErr:  false,
        },
        {
            name:     "tenant-2 creates same entity (isolated)",
            tenantID: "tenant-2",
            input:    &Entity{Name: "Entity A"},
            wantErr:  false, // Different tenant = different database = allowed
        },
    }

    for _, tt := range tests {
        t.Run(tt.name, func(t *testing.T) {
            // Inject tenant-specific context
            ctx := core.ContextWithTenantID(context.Background(), tt.tenantID)
            ctx = tmcore.ContextWithPG(ctx, mockDB)

            _, err := repo.Create(ctx, tt.input)

            if tt.wantErr {
                require.Error(t, err)
            } else {
                require.NoError(t, err)
            }
        })
    }
}

Integration Tests with Tenant Isolation

// tests/integration/multi_tenant_test.go
func TestMultiTenant_TenantIsolation(t *testing.T) {
    if !h.IsMultiTenantEnabled() {
        t.Skip("Multi-tenant mode is not enabled")
    }

    env := h.LoadEnvironment()
    ctx := context.Background()
    client := h.NewHTTPClient(env.ServerURL, env.HTTPTimeout)

    // Define two distinct tenants
    tenantA := "tenant-a-" + h.RandString(6)
    tenantB := "tenant-b-" + h.RandString(6)

    headersTenantA := h.TenantAuthHeaders(h.RandHex(8), tenantA)
    headersTenantB := h.TenantAuthHeaders(h.RandHex(8), tenantB)

    // Step 1: Tenant A creates organization
    codeA, bodyA, _ := client.Request(ctx, "POST", "/v1/organizations", headersTenantA, orgPayload)
    require.Equal(t, 201, codeA)

    var orgA struct{ ID string `json:"id"` }
    json.Unmarshal(bodyA, &orgA)

    // Step 2: Tenant B creates organization
    codeB, bodyB, _ := client.Request(ctx, "POST", "/v1/organizations", headersTenantB, orgPayload)
    require.Equal(t, 201, codeB)

    var orgB struct{ ID string `json:"id"` }
    json.Unmarshal(bodyB, &orgB)

    // Step 3: Verify Tenant A cannot see Tenant B's data
    code, body, _ := client.Request(ctx, "GET", "/v1/organizations", headersTenantA, nil)
    require.Equal(t, 200, code)

    var list struct{ Items []struct{ ID string `json:"id"` } `json:"items"` }
    json.Unmarshal(body, &list)

    for _, item := range list.Items {
        assert.NotEqual(t, orgB.ID, item.ID, "ISOLATION VIOLATION: Tenant A can see Tenant B's data")
    }
}

Testing Error Cases

func TestMiddleware_WithTenantPostgres_ErrorCases(t *testing.T) {
    tests := []struct {
        name           string
        setupContext   func(*fiber.Ctx)
        expectedStatus int
        expectedCode   string
    }{
        {
            name: "missing JWT token",
            setupContext: func(c *fiber.Ctx) {
                // No Authorization header
            },
            expectedStatus: 401,
            expectedCode:   "TENANT_ID_REQUIRED",
        },
        {
            name: "JWT without tenantId claim",
            setupContext: func(c *fiber.Ctx) {
                token := createJWTWithoutTenantClaim()
                c.Request().Header.Set("Authorization", "Bearer "+token)
            },
            expectedStatus: 401,
            expectedCode:   "TENANT_ID_REQUIRED",
        },
        {
            name: "tenant not found in Tenant Manager",
            setupContext: func(c *fiber.Ctx) {
                token := createJWT(map[string]interface{}{"tenantId": "unknown-tenant"})
                c.Request().Header.Set("Authorization", "Bearer "+token)
            },
            expectedStatus: 404,
            expectedCode:   "TENANT_NOT_FOUND",
        },
    }

    for _, tt := range tests {
        t.Run(tt.name, func(t *testing.T) {
            app := fiber.New()
            ctx := app.AcquireCtx(&fasthttp.RequestCtx{})
            defer app.ReleaseCtx(ctx)

            tt.setupContext(ctx)

            err := middleware.WithTenantDB(ctx)

            require.Error(t, err)
            fiberErr, ok := err.(*fiber.Error)
            require.True(t, ok)
            assert.Equal(t, tt.expectedStatus, fiberErr.Code)
        })
    }
}

Testing RabbitMQ Multi-Tenant Consumer

func TestRabbitMQConsumer_MultiTenant(t *testing.T) {
    t.Run("injects tenant context from X-Tenant-ID header", func(t *testing.T) {
        tenantID := "tenant-456"

        // Create message with tenant header
        message := amqp.Delivery{
            Headers: amqp.Table{
                "X-Tenant-ID": tenantID,
            },
            Body: []byte(`{"action": "create"}`),
        }

        // Setup consumer
        consumer := NewMultiTenantConsumer(pool, logger)

        // Process message
        ctx, err := consumer.injectTenantDBConnections(
            context.Background(),
            tenantID,
            logger,
        )

        // Assert tenant context injected
        require.NoError(t, err)
        extractedTenant := tmcore.GetTenantIDContext(ctx)
        assert.Equal(t, tenantID, extractedTenant)
    })
}

Testing Redis Key Prefixing

func TestRedisRepository_MultiTenant_KeyPrefixing(t *testing.T) {
    t.Run("prefixes keys with tenant ID", func(t *testing.T) {
        tenantID := "tenant-789"
        ctx := core.ContextWithTenantID(context.Background(), tenantID)

        repo := NewRedisRepository(redisConn)

        err := repo.Set(ctx, "user:session", "value123", 3600)
        require.NoError(t, err)

        // Verify key was prefixed
        key := valkey.GetKeyContext(ctx, "user:session")
        assert.Equal(t, "tenant:tenant-789:user:session", key)
    })

    t.Run("single-tenant mode does not prefix keys", func(t *testing.T) {
        ctx := context.Background()

        key := valkey.GetKeyContext(ctx, "user:session")
        assert.Equal(t, "user:session", key)
    })
}

Error Handling

Error	HTTP Status	Code	When
Missing tenantId claim	401	`TENANT_ID_REQUIRED`	JWT doesn't have tenantId
Tenant not found	404	`TENANT_NOT_FOUND`	Tenant not registered in Tenant Manager
Tenant not provisioned	422	`TENANT_NOT_PROVISIONED`	Database schema not initialized (SQLSTATE 42P01)
Tenant suspended	403	service-specific	Tenant status is suspended or purged (use `errors.As(err, &core.TenantSuspendedError{})`)
Service not configured	503	service-specific	Tenant exists but has no config for this service/module (`core.ErrServiceNotConfigured`)
Schema mode error	422	service-specific	Invalid schema configuration for tenant database
Connection error	503	service-specific	Failed to get or establish tenant connection
Manager closed	503	service-specific	Connection manager has been shut down (`core.ErrManagerClosed`)
Circuit breaker open	503	service-specific	Tenant Manager client tripped after consecutive failures (`core.ErrCircuitBreakerOpen`)
Tenant config rate limited	503	service-specific	Too many concurrent requests for the same tenant config — retry after brief delay

Tenant Isolation Verification (⚠️ CONDITIONAL)

Multi-tenant applications MUST verify tenant isolation to prevent data leakage between tenants.

⛔ CONDITIONAL: This section applies ONLY if MULTI_TENANT_ENABLED=true. If single-tenant, mark as N/A.

Detection Question: Is this a multi-tenant service?

# Check if multi-tenant mode is enabled
grep -rn "MULTI_TENANT_ENABLED\|MultiTenantEnabled" internal/ --include="*.go"

# If 0 matches OR always set to false: Mark N/A
# If found AND can be true: Apply this section

Isolation Architecture

Multi-tenant isolation uses a database-per-tenant model. The tenantId from JWT determines which database the request connects to. Each tenant has its own database — tenant A cannot query tenant B's database.

Mechanism	How It Works	Protection
JWT `tenantId` extraction	`TenantMiddleware` extracts `tenantId` claim from JWT	Identifies the tenant
Database routing	`TenantConnectionManager` resolves tenant-specific DB connection	Tenant A → Database A, Tenant B → Database B
Context injection	Connection stored in request context	Repositories use `tmcore.GetPGContext(ctx)` / `tmcore.GetPGContext(ctx, module)` / `tmcore.GetMBContext(ctx)` / `tmcore.GetMBContext(ctx, module)`
Single-tenant passthrough	`IsMultiTenant() == false` → `c.Next()` immediately	Backward compatibility

Why Tenant Isolation Verification Is MANDATORY

Attack	Without Verification	With Verification
Cross-tenant data access	Tenant A accesses Tenant B's database	Connection-level isolation prevents it
Data exfiltration	Cross-tenant data leakage	Separate databases per tenant

Detection Commands (MANDATORY)

# MANDATORY: Run before every PR in multi-tenant services
# Verify all repositories use context-based connections (not static)
grep -rn "GetPGContext\|GetMBContext" internal/adapters/ --include="*.go"

# Verify no repositories use static/hardcoded connections when multi-tenant is enabled
# Excludes tenant-aware variables (tenantDB, tenantmanager) to avoid false positives
grep -rn "\.DB\.\|\.Database\." internal/adapters/ --include="*.go" | grep -v "_test.go" | grep -v "tenantmanager\|tenantDB"

# Expected: All repositories should use tenant-manager context getters (core, valkey, s3 packages)

Anti-Rationalization Table

Rationalization	Why It's WRONG	Required Action
"Static connection works fine"	Static connection goes to ONE database. All tenants share it. No isolation.	Use tmcore.GetPGContext(ctx) / tmcore.GetPGContext(ctx, module) / tmcore.GetMBContext(ctx) / tmcore.GetMBContext(ctx, module)
"We only have one customer"	Requirements change. Multi-tenant is easy to add now, hard later.	Design for multi-tenant, deploy as single
"organization_id filtering = tenant isolation"	organization_id does NOT route to different databases. It is NOT multi-tenant.	Use tenantId from JWT → TenantConnectionManager

Context Functions

lib-commons provides two sets of context functions for database resolution. Use module-specific getters for multi-module services and generic getters for single-module services.

Context getters (used by repository code):

Function	Use When
`tmcore.GetPGContext(ctx)`	Single-module — returns generic PG connection
`tmcore.GetPGContext(ctx, module)`	Multi-module — returns module-specific PG connection
`tmcore.GetMBContext(ctx)`	Single-module — returns generic MongoDB
`tmcore.GetMBContext(ctx, module)`	Multi-module — returns module-specific MongoDB

// Single-module: use generic getter
db := tmcore.GetPGContext(ctx)

// Multi-module: use module-specific getter
db := tmcore.GetPGContext(ctx, constant.ModuleOnboarding)

Context setters (used by middleware, not by service code):

tmcore.ContextWithTenantID(ctx, tenantID) — stores tenant ID
tmcore.ContextWithPG(ctx, db) — generic PG connection (set by TenantMiddleware for single-module)
tmcore.ContextWithPG(ctx, db, module) — module-specific PG (when service has multiple DB modules; also sets generic)
tmcore.ContextWithMB(ctx, db) — generic MongoDB connection
tmcore.ContextWithMB(ctx, db, module) — module-specific MongoDB (also sets generic)

Tenant ID Validation

lib-commons validates tenant IDs to prevent path traversal and injection:

const maxTenantIDLength = 256
var validTenantIDPattern = regexp.MustCompile(`^[a-zA-Z0-9][a-zA-Z0-9_-]*$`)

Rules:

Must start with alphanumeric character
Only alphanumeric, underscore, and hyphen allowed
Maximum 256 characters
Empty strings rejected

Redis Key Prefixing for Lua Scripts

Beyond simple Set/Get operations, Redis Lua scripts require special attention. ALL keys passed as KEYS[] and ARGV[] to Lua scripts MUST be pre-prefixed in Go before the script execution:

// ✅ CORRECT: Prefix keys in Go before Lua execution
prefixedBackupQueue := valkey.GetKeyContext(ctx, TransactionBackupQueue)
prefixedTransactionKey := valkey.GetKeyContext(ctx, transactionKey)
prefixedBalanceSyncKey := valkey.GetKeyContext(ctx, utils.BalanceSyncScheduleKey)

// Also prefix ARGV values that are used as keys inside the Lua script
prefixedInternalKey := valkey.GetKeyContext(ctx, blcs.InternalKey)

result, err := script.Run(ctx, rds,
    []string{prefixedBackupQueue, prefixedTransactionKey, prefixedBalanceSyncKey},
    finalArgs...).Result()

// ❌ FORBIDDEN: Hardcoded keys inside Lua script
-- Lua script must NEVER reference keys by name
local key = "balance:lock"  -- WRONG: not tenant-prefixed

// ✅ CORRECT: Lua script uses only KEYS[] and ARGV[]
local backupQueue = KEYS[1]  -- Already prefixed by Go caller
local txKey = KEYS[2]        -- Already prefixed by Go caller

This pattern also ensures Redis Cluster compatibility (all keys in KEYS[] must be in the same hash slot for atomic operations).

Multi-Tenant Metrics

Services implementing multi-tenant MUST expose these metrics:

Metric	Type	Description
`tenant_connections_total`	Counter	Total tenant connections created
`tenant_connection_errors_total`	Counter	Connection failures per tenant
`tenant_consumers_active`	Gauge	Active message consumers
`tenant_messages_processed_total`	Counter	Messages processed per tenant

Responsibility Split: lib-commons vs Service

Responsibility	lib-commons handles	Service MUST implement
Connection pooling	Cache per tenant, double-check locking	-
Credential fetching	HTTP call to Tenant Manager API	-
JWT parsing	Extract `tenantId` from token (both middlewares)	-
Tenant discovery	Redis -> API fallback, sync loop	-
Consumer lifecycle	On-demand spawn, backoff, degraded tracking	-
Multi-module connection injection	`TenantMiddleware` resolves PG/MB for all registered modules	-
Error mapping	Default error mapper in middleware	-
Middleware registration	-	Register `TenantMiddleware` with `WithPG`/`WithMB` on routes
Repository adaptation	-	Use `tmcore.GetPGContext(ctx)` / `tmcore.GetPGContext(ctx, module)` / `tmcore.GetMBContext(ctx)` / `tmcore.GetMBContext(ctx, module)` instead of global DB
Redis key prefixing	-	Call `valkey.GetKeyContext(ctx, key)` for every Redis operation
S3 key prefixing	Tenant-aware key prefix (`s3.GetS3KeyStorageContext`)	Call `s3.GetS3KeyStorageContext(ctx, key)` for every S3 operation
Consumer setup	-	Register handlers, call `consumer.Run(ctx)` at startup
Settings revalidation (PostgreSQL only)	pgManager handles internally via `WithConnectionsCheckInterval`, `ApplyConnectionSettings()`	Pass `WithConnectionsCheckInterval` when creating pgManager. MongoDB excluded (driver cannot resize pools).
Error handling	Return sentinel errors	Map errors to HTTP status codes (or provide custom `ErrorMapper`)

Anti-Rationalization Table (General)

Rationalization	Why It's WRONG	Required Action
"Service already has multi-tenant code"	Existence ≠ compliance. Code that doesn't match the Ring canonical model (lib-commons v4 tenant-manager sub-packages) is non-compliant and MUST be replaced entirely.	STOP. Run compliance audit against this document. Replace every non-compliant component.
"Our custom multi-tenant approach works"	Working ≠ compliant. Custom implementations create drift, block lib-commons upgrades, prevent standardized tooling, and cannot be validated by automated compliance checks.	STOP. Replace with canonical lib-commons v4 implementation.
"Just need to adapt/patch the existing code"	Non-standard implementations cannot be patched into compliance. The patterns are structurally different (context-based resolution vs static connections, lib-commons middleware vs custom middleware).	STOP. Replace, do not patch.
"We only have one customer"	Requirements change. Multi-tenant is easy to add now, hard later.	Design for multi-tenant, deploy as single
"Tenant Manager adds complexity"	Complexity is in connection management anyway. Tenant Manager standardizes it.	Use Tenant Manager for multi-tenant
"JWT parsing is expensive"	Parse once in middleware, use from context everywhere.	Extract tenant once, propagate via context
"We'll add tenant isolation later"	Retrofitting tenant isolation is a rewrite.	Build tenant-aware from the start

Final Step: Single-Tenant Backward Compatibility Validation (MANDATORY)

HARD GATE: This is the LAST step of every multi-tenant implementation. CANNOT merge or deploy without completing this validation.

If the service was already running as single-tenant before multi-tenant was added, it MUST continue working unchanged with MULTI_TENANT_ENABLED=false (the default). Multi-tenant is opt-in. Single-tenant is the baseline. Breaking it is a production incident for all existing deployments.

How the Backward Compatibility Mechanism Works

The middleware checks pool.IsMultiTenant() at the start of every request. When MULTI_TENANT_ENABLED=false:

IsMultiTenant() returns false
Middleware returns c.Next() immediately — zero tenant logic applied
No JWT parsing, no Tenant Manager calls, no pool routing
The service uses its default database connection as before

// Single-tenant mode: pass through if pool is not multi-tenant
if !pool.IsMultiTenant() {
    return c.Next()  // No tenant logic applied — service works as before
}

Validation Steps (execute in order)

Step 1 — Remove all multi-tenant env vars and start the service:

# Unset ALL multi-tenant variables
unset MULTI_TENANT_ENABLED MULTI_TENANT_URL
unset MULTI_TENANT_MAX_TENANT_POOLS MULTI_TENANT_IDLE_TIMEOUT_SEC
unset MULTI_TENANT_CIRCUIT_BREAKER_THRESHOLD MULTI_TENANT_CIRCUIT_BREAKER_TIMEOUT_SEC

# Start the service — MUST start without errors, without Tenant Manager running
go run cmd/app/main.go
# Expected log: "Running in SINGLE-TENANT MODE"

Step 2 — Run the full existing test suite (single-tenant):

# ALL pre-existing tests MUST pass with multi-tenant disabled
MULTI_TENANT_ENABLED=false go test ./...

Step 3 — Run backward compatibility integration test:

func TestMultiTenant_BackwardCompatibility(t *testing.T) {
    // MUST skip when multi-tenant is enabled — this test validates single-tenant only
    if h.IsMultiTenantEnabled() {
        t.Skip("Skipping backward compatibility test - multi-tenant mode is enabled")
    }

    // Create resources WITHOUT tenant context — MUST work in single-tenant mode
    code, body, err := client.Request(ctx, "POST", "/v1/organizations", headers, orgPayload)
    require.Equal(t, 201, code, "single-tenant CRUD must work without tenant context")

    // List resources — MUST return data normally
    code, body, err = client.Request(ctx, "GET", "/v1/organizations", headers, nil)
    require.Equal(t, 200, code, "single-tenant list must work without tenant context")

    // Health endpoints — MUST work without any auth or tenant context
    code, _, _ = client.Request(ctx, "GET", "/health", nil, nil)
    require.Equal(t, 200, code, "health endpoint must work in single-tenant mode")
}

Step 4 — Validate against this checklist:

#	Check	How to Verify	Pass Criteria
1	Service starts without `MULTI_TENANT_*` vars	Remove all vars, start service	Starts normally, logs "SINGLE-TENANT MODE"
2	Service starts without Tenant Manager	Don't run Tenant Manager, start service	No connection errors, no panics
3	All existing CRUD operations work	Run pre-existing integration tests	All pass with same behavior as before
4	Health/version/swagger endpoints work	`GET /health`, `GET /version`	200 OK without any auth headers
5	Default PostgreSQL connection is used	Check DB queries go to the configured `POSTGRES_HOST`	Queries hit single-tenant database
6	No new required env vars break startup	Start with only the env vars the service had before	Service starts without errors

Step 5 — Run multi-tenant test suite (both modes work):

# Confirm multi-tenant mode also works
MULTI_TENANT_ENABLED=true MULTI_TENANT_URL=http://tenant-manager:4003 go test ./... -run "MultiTenant"

Anti-Rationalization

Rationalization	Why It's WRONG	Required Action
"Nobody uses single-tenant anymore"	Existing deployments depend on it. Breaking them is a production incident for every customer running self-hosted.	STOP. Run the validation steps.
"We tested multi-tenant, that's enough"	Multi-tenant tests exercise a DIFFERENT code path (`IsMultiTenant()=true`). Single-tenant (`IsMultiTenant()=false`) is a separate path that needs separate verification.	STOP. Run both test suites.
"The passthrough is trivial, it can't break"	Config struct changes, new required env vars, middleware ordering changes, import side effects — all of these can silently break the passthrough path.	STOP. Verify with actual requests.
"I'll test single-tenant later"	Later never comes. Once merged, the damage is done. Existing CI may not catch it if tests assume multi-tenant.	STOP. Test now, before merge.

Checklist

Environment Variables:

MULTI_TENANT_ENABLED in config struct (default: false)
MULTI_TENANT_URL in config struct (required if multi-tenant)
MULTI_TENANT_REDIS_HOST in config struct (required if multi-tenant)
MULTI_TENANT_REDIS_PORT in config struct (default: 6379)
MULTI_TENANT_REDIS_PASSWORD in config struct
MULTI_TENANT_REDIS_TLS in config struct (default: false)
MULTI_TENANT_MAX_TENANT_POOLS in config struct (default: 100)
MULTI_TENANT_IDLE_TIMEOUT_SEC in config struct (default: 300)
MULTI_TENANT_CIRCUIT_BREAKER_THRESHOLD in config struct (default: 5)
MULTI_TENANT_CIRCUIT_BREAKER_TIMEOUT_SEC in config struct (default: 30)
MULTI_TENANT_SERVICE_API_KEY in config struct (required)
MULTI_TENANT_CACHE_TTL_SEC in config struct (default: 120)

Event-Driven Discovery (required if multi-tenant):

tmredis.NewTenantPubSubRedisClient(ctx, cfg) used to create Redis client (NOT manual libRedis.Config)
tmevent.NewTenantEventListener(tmRedisClient, dispatcher.HandleEvent, ...) wired with Pub/Sub Redis client
All 13 canonical MULTI_TENANT_* envs declared in .env.example (commented out with defaults)

Architecture:

client.NewClient(url, logger, opts...) returns (*Client, error) — handle error for fail-fast
client.WithServiceAPIKey(cfg.MultiTenantServiceAPIKey) always passed (lib-commons validates internally)
tmpostgres.NewManager(client, service, WithModule(...), WithLogger(...), WithMaxTenantPools(...), WithIdleTimeout(...)) for PostgreSQL pool
Each manager has Stats(), IsMultiTenant(), and ApplyConnectionSettings() methods

Middleware — choose one:

For single-module services: tmmiddleware.NewTenantMiddleware(WithPG(...), WithTenantCache(...), WithTenantLoader(...)) registered in routes via WhenEnabled
For multi-module services: tmmiddleware.NewTenantMiddleware(WithPG(..., "module"), WithMB(..., "module"), WithTenantCache(...), WithTenantLoader(...)) registered in routes via WhenEnabled
WhenEnabled helper implemented locally in the routes file (nil check → c.Next())
Tenant middleware passed as nil when MULTI_TENANT_ENABLED=false (single-tenant passthrough via WhenEnabled nil check)

Middleware & Context:

JWT tenant extraction (claim key: tenantId)
core.ContextWithTenantID() in middleware
Public endpoints (/health, /version, /swagger) bypass tenant middleware
core.ErrTenantNotFound → 404, core.ErrManagerClosed → 503, core.ErrServiceNotConfigured → 503
core.IsTenantNotProvisionedError() in error handler → 422
core.ErrTenantContextRequired handled in repositories

Repositories:

tmcore.GetPGContext(ctx) or tmcore.GetPGContext(ctx, module) in PostgreSQL repositories
valkey.GetKeyContext(ctx, key) for ALL Redis keys (including Lua script KEYS[] and ARGV[])
tmcore.GetMBContext(ctx) or tmcore.GetMBContext(ctx, module) in MongoDB repositories (if using MongoDB)
s3.GetS3KeyStorageContext(ctx, key) for ALL S3 operations (if using S3/object storage)

Async Processing:

Tenant ID header (X-Tenant-ID) in RabbitMQ messages
consumer.NewMultiTenantConsumer with consumer.WithPG and consumer.WithMB
consumer.Register(queueName, handler) for each queue
consumer.Run(ctx) at startup (non-blocking, <1s)

Testing:

Unit tests with mock tenant context
Tenant isolation tests (verify data separation between tenants)
Error case tests (missing tenant, invalid tenant, tenant not found)
Integration tests with two distinct tenants verifying cross-tenant isolation
RabbitMQ consumer tests (X-Tenant-ID header extraction)
Redis key prefixing tests (verify tenant prefix applied, including Lua scripts)

Single-Tenant Backward Compatibility (MANDATORY):

All existing tests pass with MULTI_TENANT_ENABLED=false (default)
Service starts without any MULTI_TENANT_* environment variables
Service starts without Tenant Manager running
All CRUD operations work in single-tenant mode
Backward compatibility integration test exists (TestMultiTenant_BackwardCompatibility)
Health/version endpoints work without tenant context

Route-Level Auth-Before-Tenant Ordering (MANDATORY)

MANDATORY: When using multi-tenant middleware, auth MUST validate the JWT before tenant middleware resolves the database connection. This ordering is a security requirement, not a performance optimization.

Why This Matters

Concern	Impact Without Auth-Before-Tenant
SECURITY	Forged or expired JWTs trigger Tenant Manager API calls before token signature validation. Any request with a `tenantId` claim — valid or not — causes a network round-trip to resolve tenant DB credentials.
PERFORMANCE	Unauthenticated requests trigger unnecessary Tenant Manager API round-trips (~50ms+ each). At scale, this adds significant latency and load to the Tenant Manager service.
DoS VECTOR	Attackers can flood the Tenant Manager API with crafted tokens containing valid-looking `tenantId` claims. Since tenant resolution happens before auth rejects the token, every malicious request costs a TM API call.

The WRONG Pattern (Anti-Pattern)

// ❌ WRONG: Tenant middleware runs before auth on ALL routes
app.Use(tenantMid.WithTenantDB)  // Runs first — calls TM API before auth validates JWT
app.Post("/v1/resources", auth.Authorize("app", "resource", "post"), handler.Create)

In this pattern, WithTenantDB executes for every request before auth.Authorize validates the JWT. A request with a forged JWT containing tenantId: "victim-tenant" triggers a full Tenant Manager resolution — fetching credentials, opening connections — before auth rejects it.

The CORRECT Pattern: WhenEnabled

MUST use WhenEnabled — a simple helper function that each project implements locally — to conditionally apply tenant middleware per-route. Auth is listed before WhenEnabled in the handler chain, guaranteeing auth runs first. Tenant resolution runs only for authenticated requests.

WhenEnabled implementation (each project implements this locally):

// WhenEnabled is a helper that conditionally applies a middleware if it's not nil.
// When multi-tenant is disabled, the middleware passed is nil and WhenEnabled calls c.Next().
func WhenEnabled(middleware fiber.Handler) fiber.Handler {
    return func(c *fiber.Ctx) error {
        if middleware == nil {
            return c.Next()
        }

        return middleware(c)
    }
}

Route registration:

// ✅ CORRECT: Auth validates JWT FIRST, then tenant resolves DB
// ttHandler is nil when MULTI_TENANT_ENABLED=false (single-tenant passthrough)
f.Post("/v1/resources", auth.Authorize("app", "resource", "post"), WhenEnabled(ttHandler), handler.Create)
f.Get("/v1/resources", auth.Authorize("app", "resource", "get"), WhenEnabled(ttHandler), handler.GetAll)
f.Get("/v1/resources/:id", auth.Authorize("app", "resource", "get"), WhenEnabled(ttHandler), handler.GetByID)

How it works:

auth.Authorize(...) is the first handler — validates JWT before anything else
WhenEnabled(ttHandler) runs second — if ttHandler is nil (single-tenant mode), it calls c.Next() immediately; if non-nil, it executes the tenant middleware
The business handler runs last
If auth rejects the request, tenant middleware never runs — no TM API call
If MULTI_TENANT_ENABLED=false, ttHandler is nil and WhenEnabled is a no-op — zero overhead

Detection Commands (MANDATORY)

# MANDATORY: Run before every PR in multi-tenant services
# Check for global tenant middleware registration (anti-pattern)
grep -rn "app\.Use(.*WithTenantDB\|app\.Use(.*tenantMid" internal/ --include="*.go"
# Expected: 0 matches. Tenant middleware MUST NOT be registered globally.

# Check for correct per-route composition: auth.Authorize BEFORE WhenEnabled on same route
grep -rnE '^\s*(app|f)\.(Get|Post|Put|Patch|Delete)\(.*auth\.Authorize\(.*WhenEnabled\(' internal/ --include="*.go"
# Expected: 1+ matches in routes.go — auth appears before WhenEnabled on protected routes.

Anti-Rationalization Table

Rationalization	Why It's WRONG	Required Action
"Auth middleware is already global, so order doesn't matter"	Global middleware ordering is implicit and fragile — a refactor can silently break it. Different routes may need different auth handlers.	MUST use explicit per-route composition: `auth, WhenEnabled(tenant), handler`
"Tenant resolution is fast, no harm running it first"	TM API calls are network round-trips (~50ms+). At scale, unauthorized traffic amplifies cost. Every unauthenticated request wastes a TM API call.	MUST authenticate before any TM API call
"We'll just register auth middleware before tenant in app.Use()"	Global ordering provides no guarantee per-route. Different routes may need different auth handlers. A single `app.Use()` reorder silently breaks security for all routes.	MUST compose auth+tenant per-route using WhenEnabled
"Only internal services call this endpoint, no DoS risk"	Internal networks are not trusted by default. Compromised services, misconfigured proxies, or lateral movement can generate unauthorized traffic.	MUST enforce auth-before-tenant regardless of network topology
"We already validate tokens at the API gateway"	Defense in depth. Gateway validation can be bypassed, misconfigured, or removed. Service-level auth is the last line of defense.	MUST validate auth at service level before tenant resolution

Multi-Tenant Message Queue Consumers

When to Use

CONDITIONAL: Only applies if your service:

Processes messages from a message broker (RabbitMQ, SQS, etc.)
Uses per-tenant message isolation (dedicated vhosts or queues per tenant)
Has 10+ tenants where startup time is a concern

If single-tenant or <10 tenants: Multi-tenant mode overhead may be unnecessary. Consider single-tenant architecture.

Problem: Startup Time Scales with Tenant Count

In multi-tenant services consuming messages, connecting to ALL tenant vhosts at startup causes:

Startup Time = N tenants × 500ms per connection
10 tenants  = 5 seconds
100 tenants = 50 seconds  ← Unacceptable for autoscaling/deployments

Symptoms:

Slow deployments (rolling updates wait for each pod to connect)
Poor autoscaling responsiveness (new pods take 30-60s to become ready)
Wasted resources (connections to inactive tenants)

Solution: On-Demand Consumer Initialization

Pattern: Decouple tenant discovery from consumer connection.

Startup (MultiTenantConsumer.Run):
  1. Discover tenant list (Redis/API) - lightweight, <1s
  2. Track discovered tenants in memory (knownTenants map)
  3. Start background sync loop (periodic re-discovery)
  4. Return immediately (startup complete)

On-Demand Consumer Spawn (internal):
  1. Background sync discovers new tenant
  2. Calls EnsureConsumerStarted(ctx, tenantID) internally
  3. Consumer spawns on-demand (first time: ~500ms)
  4. Connection cached for reuse
  5. Subsequent calls: fast path (<1ms)

Result: Startup time O(1) regardless of tenant count, resources scale with active tenants only.

Architecture Components

1. Tenant Discovery Service

Responsibility: Provide list of active tenants without connection overhead.

Implementation:

Primary: Cache (Redis SET: tenant-manager:tenants:active)
Fallback: HTTP API (GET /tenants/active?service={serviceName})

Response format (API):

[
  {"id": "tenant-001", "name": "Tenant A", "status": "active"},
  {"id": "tenant-002", "name": "Tenant B", "status": "active"}
]

Endpoint characteristics:

Returns minimal info (id, name, status only)
Supports optional filtering by service (query param: ?service={serviceName})

2. Consumer Manager (lib-commons)

Responsibility: Manage lifecycle of message consumers across tenants with on-demand initialization.

Key methods:

type MultiTenantConsumer struct {
    mu sync.RWMutex                      // Protects knownTenants and activeTenants maps
    knownTenants map[string]bool        // Discovered tenants
    activeTenants map[string]CancelFunc // Running consumers
    consumerLocks sync.Map               // Per-tenant mutexes
    retryState sync.Map                  // Failure tracking
}

// Discover tenants without connecting (non-blocking, <1s)
func (c *MultiTenantConsumer) Run(ctx context.Context) error

// Ensure consumer is active for tenant (idempotent, thread-safe, fire-and-forget)
func (c *MultiTenantConsumer) EnsureConsumerStarted(ctx context.Context, tenantID string)

// Stop consumer goroutine and clean up all state for a tenant.
// Cancels context, removes from activeTenants/knownTenants/retryState/consumerLocks.
// Prevents orphaned retry loops after tenant eviction.
func (c *MultiTenantConsumer) StopConsumer(tenantID string)

// Check if tenant has failed repeatedly
func (c *MultiTenantConsumer) IsDegraded(tenantID string) bool

// Get runtime statistics
func (c *MultiTenantConsumer) Stats() ConsumerStats

3. Service Bootstrap Trigger

Responsibility: Trigger consumer spawn for tenants during service initialization and via background sync.

Implementation pattern:

// In bootstrap/config.go
tmConsumer := consumer.NewMultiTenantConsumer(...)
if err := tmConsumer.Run(ctx); err != nil {
    logger.Errorf("tenant manager startup failed: %v", err)
    // Startup continues - consumer will retry tenant discovery in background
}

The MultiTenantConsumer.Run() method discovers tenants and starts a background sync loop that spawns consumers for newly discovered tenants. Individual tenant consumers are started on-demand via EnsureConsumerStarted() which is called internally by the consumer manager.

Implementation Steps

Step 1: Update Shared Library

Implement on-demand consumer initialization in your message queue consumer library:

Add state tracking:

knownTenants map[string]bool        // Discovered via API/cache
activeTenants map[string]CancelFunc // Actually running
consumerLocks sync.Map               // Prevent duplicate spawns

Non-blocking discovery:

func (c *Consumer) Run(ctx context.Context) error {
    // Discover tenants (timeout: 500ms, soft failure)
    c.discoverTenants(ctx)

    // Start background sync loop (for tenant add/remove)
    go c.runSyncLoop(ctx)

    return nil  // Return immediately
}

On-demand spawning with double-check locking:

func (c *Consumer) EnsureConsumerStarted(ctx context.Context, tenantID string) {
    // Fast path: check if already active (read lock)
    c.mu.RLock()
    if _, exists := c.activeTenants[tenantID]; exists {
        c.mu.RUnlock()
        return  // Already running
    }
    c.mu.RUnlock()

    // Slow path: acquire per-tenant mutex (prevent thundering herd)
    mutex := c.getPerTenantMutex(tenantID)
    mutex.Lock()
    defer mutex.Unlock()

    // Double-check after acquiring lock
    c.mu.RLock()
    if _, exists := c.activeTenants[tenantID]; exists {
        c.mu.RUnlock()
        return  // Another goroutine created it
    }
    c.mu.RUnlock()

    // Spawn consumer (fire-and-forget, errors logged internally)
    c.startTenantConsumer(ctx, tenantID)
}

Step 2: Add Tenant Discovery Endpoint

In your tenant management service:

// GET /tenants/active?service={serviceName}
func GetActiveTenants(c *fiber.Ctx) error {
    service := c.Query("service")

    // Query active tenants (filter by service if provided)
    tenants, err := repo.ListActiveTenants(service)
    if err != nil {
        return libHTTP.InternalServerError(c, "TENANT_LIST_FAILED", err)
    }

    // Return minimal info (no credentials)
    response := []TenantSummary{}
    for _, t := range tenants {
        response = append(response, TenantSummary{
            ID: t.ID,
            Name: t.Name,
            Status: t.Status,
        })
    }

    return libHTTP.OK(c, response)
}

// Register as PUBLIC endpoint (no auth)
app.Get("/tenants/active", GetActiveTenants)

Step 3: Wire Consumer in Service Bootstrap

In each service consuming messages:

// In bootstrap/config.go
func InitServers() {
    // Create multi-tenant consumer
    tmConsumer := consumer.NewMultiTenantConsumer(
        consumer.WithPG(pgManager),
        consumer.WithMB(mongoManager),
    )

    // Register message handlers
    tmConsumer.Register("queue-name", handler)

    // Start consumer (discovers tenants, starts background sync)
    if err := tmConsumer.Run(ctx); err != nil {
        logger.Errorf("tenant manager startup failed: %v", err)
        // Note: startup continues - consumer will retry tenant discovery in background
    }
}

Failure Resilience Pattern

Exponential Backoff:

func backoffDelay(retryCount int) time.Duration {
    delays := []time.Duration{5*time.Second, 10*time.Second, 20*time.Second, 40*time.Second}
    if retryCount >= len(delays) {
        return delays[len(delays)-1]  // Cap at 40s
    }
    return delays[retryCount]
}

Per-Tenant Retry State:

type retryState struct {
    count    int
    degraded bool  // True after 3 failures
}

retryState sync.Map  // Key: tenantID, Value: *retryState

Degraded Tenant Handling:

if consumer.IsDegraded(tenantID) {
    logger.Errorf("Tenant %s is degraded (3+ connection failures)", tenantID)
    return errors.New("tenant degraded")
}

Observability Pattern

Enhanced Stats API:

type ConsumerStats struct {
    ConnectionMode   string   `json:"connectionMode"`    // "on-demand"
    ActiveTenants    int      `json:"activeTenants"`     // Connected
    KnownTenants     int      `json:"knownTenants"`      // Discovered
    PendingTenants   []string `json:"pendingTenants"`    // Known but not active
    DegradedTenants  []string `json:"degradedTenants"`   // Failed 3+ times
}

Structured Logs:

connection_mode=on-demand at startup
on-demand consumer start for tenant: {id} when spawning
connecting to vhost: tenant={id} when connecting
tenant {id} marked degraded after max retries

Testing Strategy

Unit Tests:

Startup completes in <1s (0, 100, 500 tenants)
Concurrent EnsureConsumerStarted spawns exactly 1 consumer
Exponential backoff sequence (5s, 10s, 20s, 40s)
Degraded tenant detection after 3 failures

Integration Tests:

Discovery fallback (Redis → API)
On-demand connection with testcontainers
Tenant removal cleanup (<30s)

When to Use Multi-Tenant Consumer

Scenario	Recommended	Rationale
10+ tenants	✅ Yes	Startup time becomes significant with many tenants
<10 tenants	⚠️ Consider	Overhead may not justify complexity
Most tenants inactive	✅ Yes	Resources scale with active count only
All tenants active	✅ Yes	Still faster startup, resources scale proportionally
Frequent deployments	✅ Yes	Fast startup critical for CI/CD velocity
Latency-sensitive	✅ Yes*	*First request per tenant: +500ms (acceptable trade-off)

Cache Invalidation (MANDATORY)

The pmClient (tenant-manager HTTP client) has an internal HTTP response cache (1h TTL). When a tenant is added or removed, the cache MUST be invalidated via tenantClient.InvalidateConfig(ctx, tenantID, serviceName) to prevent:

On add: stale 404 or partial responses from before provisioning completed
On remove: stale 200 responses allowing evicted tenants to reconnect and re-establish infrastructure

Invalidation happens in BOTH callbacks:

Callback	When to Invalidate	Why
`OnTenantAdded`	Before `EnsureConsumerStarted`	Consumer needs fresh config to connect to newly provisioned vhost
`OnTenantRemoved`	After closing connections	Stale cached 200 would let evicted tenant pass middleware and trigger lazy-load reconnection

// InvalidateConfig signature
func (c *TenantClient) InvalidateConfig(ctx context.Context, tenantID, serviceName string) error

Without invalidation: A removed tenant's cached config (200 OK) survives for up to 1 hour, during which any HTTP request for that tenant passes middleware validation, triggers lazy-load, and re-establishes all infrastructure connections — effectively undoing the eviction.

Common Pitfalls

❌ Don't: Start consumers in discovery loop (defeats on-demand purpose) ✅ Do: Populate knownTenants only, defer connection to trigger

❌ Don't: Use global mutex for all tenants (contention) ✅ Do: Per-tenant mutex via sync.Map (fine-grained locking)

❌ Don't: Fail HTTP request if consumer spawn fails ✅ Do: Log warning, let background sync retry

❌ Don't: Forget to cleanup on tenant removal ✅ Do: Call StopConsumer first, then close connections, then invalidate cache

❌ Don't: Retry indefinitely on connection failure ✅ Do: Mark degraded after 3 failures, stop retrying

❌ Don't: Skip cache invalidation on tenant add/remove ✅ Do: Call tenantClient.InvalidateConfig in both OnTenantAdded and OnTenantRemoved

❌ Don't: Close connections before stopping consumer (causes retry storms) ✅ Do: Stop consumer goroutine FIRST, then close infrastructure connections

Single-Tenant vs Multi-Tenant Mode

// Support both single-tenant and multi-tenant deployments
if !config.MultiTenantEnabled {
    // Single-tenant: static RabbitMQ connection (no tenant isolation)
    consumer = initSingleTenantConsumer(...)
} else {
    // Multi-tenant: per-tenant vhosts with on-demand initialization
    consumer = initMultiTenantConsumer(...)
}

Note: Single-tenant uses a different consumer implementation without tenant isolation. Multi-tenant consumers use on-demand initialization for optimal startup performance.

M2M Credentials via Secret Manager

CONDITIONAL: Only implement if the service has targetServices declared (i.e., it calls other service APIs per tenant). Any service — plugin or product — that has targetServices needs M2M credential retrieval.

When This Applies

Service Type	`MULTI_TENANT_ENABLED`	Needs M2M?	Reason
Any service with `targetServices`	`true`	✅ YES	Service must authenticate with target service APIs per tenant
Any service with `targetServices`	`false`	❌ NO	Single-tenant — service uses existing static auth, no Secrets Manager calls
Any service without `targetServices`	any	❌ NO	No cross-service API calls requiring per-tenant credentials

⛔ Backward Compatibility: When MULTI_TENANT_ENABLED=false (the default), the service MUST continue working with its existing authentication mechanism — no AWS Secrets Manager calls, no M2M credential fetching. The Secret Manager path is activated only when multi-tenant mode is enabled. This follows the same conditional pattern as all other tenant-manager resources (PostgreSQL, MongoDB, Redis, S3, RabbitMQ).

How It Works

When MULTI_TENANT_ENABLED=true, each tenant has its own M2M credentials stored in AWS Secrets Manager. When a service needs to call a target service API (e.g., ledger, plugin-fees), it must:

Extract tenantOrgID from the JWT context (already available via tenant middleware)
Call secretsmanager.GetM2MCredentials() to fetch clientId + clientSecret for that tenant
Pass the credentials to the existing Plugin Access Manager integration (which handles JWT acquisition)

Note: The service already handles JWT token acquisition via Plugin Access Manager. This section only covers how to retrieve M2M credentials from AWS Secrets Manager — not how to exchange them for tokens.

Required lib-commons Package

import (
    secretsmanager "github.com/LerianStudio/lib-commons/v4/commons/secretsmanager"
)

Secret Path Convention

Credentials are stored in AWS Secrets Manager following this path:

tenants/{env}/{tenantOrgID}/{applicationName}/m2m/{targetService}/credentials

Segment	Source	Example
`env`	`ENV_NAME` env var	`staging`, `production`
`tenantOrgID`	JWT `owner` claim via `auth.GetTenantID(ctx)`	`org_01KHVKQQP6D2N4RDJK0ADEKQX1`
`applicationName`	Service's own name constant	`plugin-pix`, `midaz`, `ledger`
`targetService`	The target service being called	`ledger`, `midaz`, `plugin-fees`

Environment Variables (M2M)

In addition to the 13 canonical multi-tenant env vars, plugins MUST add:

Env Var	Description	Default	Required
`AWS_REGION`	AWS region for Secrets Manager	-	Yes (for services with targetServices)
`M2M_TARGET_SERVICE`	Target service name	-	Yes (for services with targetServices)

Implementation Pattern

1. Fetching M2M Credentials

package m2m

import (
    "context"
    "fmt"

    awsconfig "github.com/aws/aws-sdk-go-v2/config"
    awssm "github.com/aws/aws-sdk-go-v2/service/secretsmanager"
    secretsmanager "github.com/LerianStudio/lib-commons/v4/commons/secretsmanager"
)

// FetchCredentials retrieves M2M credentials from AWS Secrets Manager for a specific tenant.
func FetchCredentials(ctx context.Context, env, tenantOrgID, applicationName, targetService string) (*secretsmanager.M2MCredentials, error) {
    cfg, err := awsconfig.LoadDefaultConfig(ctx)
    if err != nil {
        return nil, fmt.Errorf("loading AWS config: %w", err)
    }

    client := awssm.NewFromConfig(cfg)

    creds, err := secretsmanager.GetM2MCredentials(ctx, client, env, tenantOrgID, applicationName, targetService)
    if err != nil {
        return nil, fmt.Errorf("fetching M2M credentials for tenant %s: %w", tenantOrgID, err)
    }

    return creds, nil
}

2. Credential Caching (MANDATORY)

MUST cache credentials using a two-level cache architecture. Hitting AWS Secrets Manager on every request is expensive and adds latency.

Cache Architecture

Level	Store	TTL	Purpose
L1	In-memory (`sync.Map`)	Fixed 30s	Fast path, avoids Redis round-trip per request
L2	Redis/Valkey (distributed)	Service-defined (e.g., 300s)	Source of truth, shared across all pods

Fallback: If Redis is not available (dev, single-tenant), in-memory becomes the only level (current behavior preserved). Mode is auto-detected — no configuration needed.

Redis Key Structure

tenant:{tenantOrgID}:m2m:{targetService}:credentials

Key prefixing uses valkey.GetKeyContext(ctx, baseKey) from lib-commons, which automatically applies the tenant prefix (e.g., tenant:{tenantId}:m2m:{targetService}:credentials). This is the same pattern used by all other Redis operations in the codebase.

Cache-Bust on Auth Failure (401)

When the token exchange (client_credentials grant) returns 401:

Delete the entry from L2 (Redis) — propagates to all pods
Delete the entry from L1 (local) — immediate effect on current pod
Re-fetch from AWS Secrets Manager on next request

This eliminates the up-to-5-minute window of using revoked credentials across pods.

Implementation

package m2m

import (
    "context"
    "encoding/json"
    "fmt"
    "sync"
    "time"

    libRedis "github.com/LerianStudio/lib-commons/v4/commons/redis"
    secretsmanager "github.com/LerianStudio/lib-commons/v4/commons/secretsmanager"
    "github.com/LerianStudio/lib-commons/v4/commons/tenant-manager/valkey"
)

// l1CacheTTL is a fixed internal constant — not configurable via env var.
const l1CacheTTL = 30 * time.Second

type cachedCredentials struct {
    creds     *secretsmanager.M2MCredentials
    expiresAt time.Time
}

// M2MCredentialProvider handles per-tenant M2M credential retrieval with two-level caching.
// L1 (in-memory) provides fast path; L2 (Redis/Valkey via lib-commons) provides cross-pod consistency.
// Token acquisition is handled by Plugin Access Manager — this only provides credentials.
type M2MCredentialProvider struct {
    smClient        secretsmanager.SecretsManagerClient
    env             string
    applicationName string
    targetService   string
    credCacheTTL    time.Duration // L2 TTL (service-defined)

    credCache sync.Map // L1: map[tenantOrgID]*cachedCredentials

    // L2: lib-commons Redis connection (nil = local-only mode)
    redisConn *libRedis.Connection
}

// NewM2MCredentialProvider creates a credential provider with two-level cache.
// Pass nil for redisConn to use local-only mode (single-tenant or dev).
func NewM2MCredentialProvider(
    smClient secretsmanager.SecretsManagerClient,
    env, applicationName, targetService string,
    credCacheTTL time.Duration,
    redisConn *libRedis.Connection,
) *M2MCredentialProvider {
    return &M2MCredentialProvider{
        smClient:        smClient,
        env:             env,
        applicationName: applicationName,
        targetService:   targetService,
        credCacheTTL:    credCacheTTL,
        redisConn:       redisConn,
    }
}

// m2mRedisKey returns the tenant-prefixed Redis key for M2M credentials.
// Uses valkey.GetKeyContext when ctx has tenant info, otherwise builds manually.
func (p *M2MCredentialProvider) m2mRedisKey(ctx context.Context, tenantOrgID string) string {
    baseKey := fmt.Sprintf("m2m:%s:credentials", p.targetService)
    // Use valkey.GetKeyContext to apply tenant prefix consistently
    return valkey.GetKeyContext(ctx, baseKey)
}

// GetCredentials returns M2M credentials for the given tenant using two-level cache.
// Lookup order: L1 (memory) → L2 (Redis via lib-commons) → AWS Secrets Manager.
// The caller (Plugin Access Manager integration) handles token acquisition.
func (p *M2MCredentialProvider) GetCredentials(ctx context.Context, tenantOrgID string) (*secretsmanager.M2MCredentials, error) {
    // L1: Check in-memory cache (fast path)
    if cached, ok := p.credCache.Load(tenantOrgID); ok {
        cc := cached.(*cachedCredentials)
        if time.Now().Before(cc.expiresAt) {
            return cc.creds, nil
        }
    }

    // L2: Check distributed cache (Redis/Valkey via lib-commons)
    if p.redisConn != nil {
        rds, err := p.redisConn.GetConnection(ctx)
        if err == nil {
            key := p.m2mRedisKey(ctx, tenantOrgID)
            val, err := rds.Get(ctx, key).Bytes()
            if err == nil {
                var creds secretsmanager.M2MCredentials
                if json.Unmarshal(val, &creds) == nil {
                    // Populate L1 with short TTL
                    p.credCache.Store(tenantOrgID, &cachedCredentials{
                        creds:     &creds,
                        expiresAt: time.Now().Add(l1CacheTTL),
                    })
                    return &creds, nil
                }
            }
        }
    }

    // Source: Fetch from AWS Secrets Manager (authoritative source)
    creds, err := secretsmanager.GetM2MCredentials(ctx, p.smClient, p.env, tenantOrgID, p.applicationName, p.targetService)
    if err != nil {
        return nil, fmt.Errorf("fetching M2M credentials for tenant %s: %w", tenantOrgID, err)
    }

    // Store in L2 (distributed via lib-commons)
    if p.redisConn != nil {
        if rds, err := p.redisConn.GetConnection(ctx); err == nil {
            key := p.m2mRedisKey(ctx, tenantOrgID)
            if data, err := json.Marshal(creds); err == nil {
                _ = rds.Set(ctx, key, data, p.credCacheTTL).Err()
            }
        }
    }

    // Store in L1 (local)
    p.credCache.Store(tenantOrgID, &cachedCredentials{
        creds:     creds,
        expiresAt: time.Now().Add(l1CacheTTL),
    })

    return creds, nil
}

// InvalidateCredentials removes cached credentials for a tenant from both cache levels.
// Call this when a 401 is received during token exchange (credential revocation).
func (p *M2MCredentialProvider) InvalidateCredentials(ctx context.Context, tenantOrgID string) {
    // Delete from L1 (local — immediate effect)
    p.credCache.Delete(tenantOrgID)

    // Delete from L2 (distributed — propagates to all pods via lib-commons)
    if p.redisConn != nil {
        if rds, err := p.redisConn.GetConnection(ctx); err == nil {
            key := p.m2mRedisKey(ctx, tenantOrgID)
            _ = rds.Del(ctx, key).Err()
        }
    }
}

3. Single-Tenant vs Multi-Tenant: Conditional Flow

This is the core pattern. The service MUST work in both modes — the M2MCredentialProvider is nil in single-tenant mode.

Few-shot 1 — Bootstrap wiring (picks the right path at startup):

// In bootstrap/config.go or bootstrap/dependencies.go

var m2mProvider *m2m.M2MCredentialProvider // nil = single-tenant mode

if cfg.MultiTenantEnabled {
    // MULTI-TENANT: create credential provider that fetches from AWS Secrets Manager
    awsCfg, err := awsconfig.LoadDefaultConfig(ctx)
    if err != nil {
        return nil, fmt.Errorf("failed to load AWS config for M2M: %w", err)
    }
    smClient := awssm.NewFromConfig(awsCfg)

    m2mProvider = m2m.NewM2MCredentialProvider(
        smClient,
        cfg.EnvName,
        constant.ApplicationName,
        cfg.M2MTargetService,
        time.Duration(cfg.M2MCredentialCacheTTLSec) * time.Second,
    )
}
// SINGLE-TENANT: m2mProvider stays nil — no AWS calls, no Secret Manager

// Both modes use the same client — it checks internally if m2mProvider is nil
productClient := product.NewClient(cfg.ProductURL, m2mProvider)

Few-shot 2 — Product client (handles both modes transparently):

// internal/adapters/product/client.go

type Client struct {
    baseURL     string
    m2mProvider *m2m.M2MCredentialProvider // nil in single-tenant mode
    httpClient  *http.Client
}

func NewClient(baseURL string, m2mProvider *m2m.M2MCredentialProvider) *Client {
    return &Client{
        baseURL:     baseURL,
        m2mProvider: m2mProvider,
        httpClient:  &http.Client{Timeout: 30 * time.Second},
    }
}

func (c *Client) CreateTransaction(ctx context.Context, input TransactionInput) (*TransactionOutput, error) {
    req, err := http.NewRequestWithContext(ctx, http.MethodPost, c.baseURL+"/v1/transactions", marshal(input))
    if err != nil {
        return nil, err
    }

    if c.m2mProvider != nil {
        // MULTI-TENANT: fetch per-tenant credentials from Secret Manager
        tenantOrgID := auth.GetTenantID(ctx)
        creds, err := c.m2mProvider.GetCredentials(ctx, tenantOrgID)
        if err != nil {
            return nil, fmt.Errorf("fetching M2M credentials for tenant %s: %w", tenantOrgID, err)
        }
        req.SetBasicAuth(creds.ClientID, creds.ClientSecret)
    }
    // SINGLE-TENANT: no credentials injected — plugin uses existing auth
    // (e.g., static token from env var, already set in headers by middleware, etc.)

    resp, err := c.httpClient.Do(req)
    if err != nil {
        return nil, fmt.Errorf("calling product API: %w", err)
    }
    defer resp.Body.Close()

    // ... handle response
}

Few-shot 3 — Service layer (no branching needed, client handles it):

func (uc *ProcessPaymentUseCase) Execute(ctx context.Context, input PaymentInput) error {
    // Works in both modes:
    // - Single-tenant: client calls product API with existing static auth
    // - Multi-tenant: client fetches per-tenant creds from Secret Manager first
    resp, err := uc.ledgerClient.CreateTransaction(ctx, input.Transaction)
    if err != nil {
        return fmt.Errorf("creating transaction in ledger: %w", err)
    }

    return nil
}

The pattern: The conditional logic lives in the client/adapter layer, not in the service/use-case layer. The service layer calls the same method regardless of mode — it doesn't know or care whether credentials came from Secret Manager or static config.

Error Handling

The secretsmanager package provides sentinel errors for precise error handling:

import (
    "errors"
    secretsmanager "github.com/LerianStudio/lib-commons/v4/commons/secretsmanager"
)

creds, err := secretsmanager.GetM2MCredentials(ctx, client, env, tenantOrgID, appName, target)
if err != nil {
    switch {
    case errors.Is(err, secretsmanager.ErrM2MCredentialsNotFound):
        // Tenant not provisioned yet — return 503 or queue for retry
    case errors.Is(err, secretsmanager.ErrM2MVaultAccessDenied):
        // IAM permissions missing or token expired — alert ops
    case errors.Is(err, secretsmanager.ErrM2MInvalidCredentials):
        // Secret exists but clientId/clientSecret missing — alert ops
    default:
        // Infrastructure error — retry with backoff
    }
}

AWS IAM Permissions

The plugin's IAM role (or ECS task role / EKS service account) MUST have permission to read the tenant secrets. Minimal policy:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": "secretsmanager:GetSecretValue",
      "Resource": "arn:aws:secretsmanager:*:*:secret:tenants/*/m2m/*/credentials-*"
    }
  ]
}

For tighter scoping, replace wildcards with specific values:

Wildcard	Scoped Example
First `*` (env)	`production` or `staging`
Second `*` (target)	`ledger` or `midaz`
Trailing `-*`	Required — AWS appends a random suffix to secret ARNs

MUST NOT grant secretsmanager:* — only GetSecretValue is needed.

Testing Guidance

The secretsmanager.SecretsManagerClient is an interface, so it can be mocked in unit tests without hitting AWS.

Mocking the client

type mockSMClient struct {
    getSecretValueFunc func(ctx context.Context, input *awssm.GetSecretValueInput, opts ...func(*awssm.Options)) (*awssm.GetSecretValueOutput, error)
}

func (m *mockSMClient) GetSecretValue(ctx context.Context, input *awssm.GetSecretValueInput, opts ...func(*awssm.Options)) (*awssm.GetSecretValueOutput, error) {
    return m.getSecretValueFunc(ctx, input, opts...)
}

Testing credential cache TTL

func TestCredentialCacheExpiry(t *testing.T) {
    callCount := 0
    mock := &mockSMClient{
        getSecretValueFunc: func(_ context.Context, _ *awssm.GetSecretValueInput, _ ...func(*awssm.Options)) (*awssm.GetSecretValueOutput, error) {
            callCount++
            secret := `{"clientId":"id","clientSecret":"secret"}`
            return &awssm.GetSecretValueOutput{SecretString: &secret}, nil
        },
    }

    provider := NewM2MCredentialProvider(mock, "test", "plugin-pix", "ledger", 1*time.Second)

    // First call — fetches from AWS
    _, err := provider.GetCredentials(context.Background(), "tenant-1")
    require.NoError(t, err)
    assert.Equal(t, 1, callCount)

    // Second call — served from cache
    _, err = provider.GetCredentials(context.Background(), "tenant-1")
    require.NoError(t, err)
    assert.Equal(t, 1, callCount) // still 1

    // Wait for cache expiry
    time.Sleep(1100 * time.Millisecond)

    // Third call — cache expired, fetches again
    _, err = provider.GetCredentials(context.Background(), "tenant-1")
    require.NoError(t, err)
    assert.Equal(t, 2, callCount)
}

Testing error scenarios

Test each sentinel error path (ErrM2MCredentialsNotFound, ErrM2MVaultAccessDenied, ErrM2MInvalidCredentials) by returning the corresponding error from the mock.

Observability & Metrics (MANDATORY)

Instrument M2MCredentialProvider to track credential retrieval health. These 6 metrics are mandatory — without them, diagnosing per-tenant M2M issues in production is not feasible.

Metric	Type	Where to Increment	Description
`m2m_credential_l1_cache_hits`	Counter	`GetCredentials` — L1 (memory) hit	Credential served from in-memory cache
`m2m_credential_l2_cache_hits`	Counter	`GetCredentials` — L2 (Redis) hit	Credential served from distributed cache
`m2m_credential_cache_misses`	Counter	`GetCredentials` — both L1 and L2 miss	Full miss, fetching from AWS Secrets Manager
`m2m_credential_fetch_errors`	Counter	`GetCredentials` — error return	AWS Secrets Manager call failed
`m2m_credential_fetch_duration_seconds`	Histogram	`GetCredentials` — around `GetM2MCredentials` call	Latency of AWS Secrets Manager requests
`m2m_credential_invalidations`	Counter	`InvalidateCredentials` — on 401	Credential cache-bust triggered by auth failure

Labels: tenant_org_id, target_service, environment.

MUST NOT include clientId or clientSecret in metric labels or log fields.

Security Considerations

MUST NOT log credentials — never log clientId or clientSecret values
MUST NOT store credentials in environment variables — always fetch from Secrets Manager at runtime
MUST use two-level cache — L1 (in-memory) for fast path, L2 (Redis) for cross-pod consistency
MUST handle credential rotation — cache TTL ensures stale credentials are refreshed automatically
MUST invalidate on 401 — call InvalidateCredentials() when token exchange returns 401 to propagate cache-bust across all pods

Anti-Rationalization

Rationalization	Why It's WRONG	Required Action
"This service doesn't need M2M"	Correct if it has no targetServices declared.	Skip this gate for services without targetServices
"We can hardcode credentials per tenant"	Hardcoded creds don't scale and are a security risk.	MUST use Secrets Manager
"Caching is optional, we'll add it later"	Every request hitting AWS adds ~50-100ms latency + cost.	MUST implement caching from day one
"We'll use env vars for client credentials"	Env vars are shared across tenants. M2M is per-tenant.	MUST use Secrets Manager per tenant
"Single-tenant plugins don't need this"	Correct if MULTI_TENANT_ENABLED=false.	Skip when single-tenant
"In-memory cache is good enough"	In-memory cache is per-pod — credential rotation or revocation won't propagate to other pods for up to 5 minutes.	MUST use distributed cache (Redis) as L2
"Cache-bust on 401 is overkill"	Without it, revoked credentials remain cached for the full TTL, causing repeated auth failures across pods.	MUST invalidate on 401

Service Authentication (MANDATORY)

Consumer services that call the Tenant Manager /settings endpoint MUST authenticate using an API key sent via the X-API-Key HTTP header. Without this header, the Tenant Manager rejects requests to protected endpoints.

How It Works

Key generation: API keys are generated per-service via the service catalog endpoint POST /services/:name/api-keys.
Key limit: Maximum 2 keys per environment per service, enabling zero-downtime rotation (create new key, roll out, revoke old key).
Header injection: The lib-commons Tenant Manager HTTP client sends the X-API-Key header automatically when configured with client.WithServiceAPIKey().
Consumer configuration: Consumer services set the MULTI_TENANT_SERVICE_API_KEY environment variable. The bootstrap code passes it to the client via WithServiceAPIKey.

Configuration

Add MULTI_TENANT_SERVICE_API_KEY to the Config struct (see Environment Variables):

MultiTenantServiceAPIKey string `env:"MULTI_TENANT_SERVICE_API_KEY"`

Wire it when creating the Tenant Manager HTTP client:

clientOpts = append(clientOpts,
    client.WithServiceAPIKey(cfg.MultiTenantServiceAPIKey),
)
tmClient, err := client.NewClient(cfg.MultiTenantURL, logger, clientOpts...)
if err != nil {
    return fmt.Errorf("creating tenant-manager client: %w", err)
}

Fail-fast: NewClient returns core.ErrServiceAPIKeyRequired if the API key is empty. The service refuses to start with a clear error instead of failing with runtime 401 errors on /settings calls.

Key Rotation

Zero-downtime rotation flow:

Generate a new API key: POST /services/:name/api-keys
Deploy the new key to the consumer service (MULTI_TENANT_SERVICE_API_KEY)
Verify the service authenticates successfully with the new key
Revoke the old key: DELETE /services/:name/api-keys/:keyId

The service catalog enforces a maximum of 2 active keys per environment, so both old and new keys work simultaneously during the rollout window.

Anti-Rationalization

Rationalization	Why It's WRONG	Required Action
"We don't need API key auth for internal services"	The `/settings` endpoint returns database credentials. Unauthenticated access is a security risk.	MUST configure `WithServiceAPIKey`
"We'll add the API key later"	Without authentication, the Tenant Manager rejects `/settings` requests. The service cannot resolve tenant connections.	MUST configure before enabling multi-tenant
"We can use a shared API key across services"	Each service MUST have its own API key for audit trail and independent revocation.	MUST generate per-service keys via service catalog

116 KiB Raw Blame History Unescape Escape

Go Standards - Multi-Tenant

Table of Contents

Multi-Tenant Patterns (CONDITIONAL)

HARD GATE: Canonical Model Compliance

Canonical File Map

Required lib-commons Version

When to Use Multi-Tenant Mode

Environment Variables

Configuration

Service Name Resolution

Manager Wiring

Tenant Manager Service API

TenantConfig Data Model

Connection Pool Management

JWT Tenant Extraction

Generic TenantMiddleware (Standard Pattern)

Multi-module middleware (TenantMiddleware with multi-module)

Multi-module service example

Simple single-module service example

Single-module vs Multi-module

Multi-Tenant Error Responses

Database Connection in Repositories

Redis Key Prefixing

S3/Object Storage Key Prefixing

RabbitMQ Multi-Tenant: Two-Layer Isolation Model

RabbitMQ Manager Initialization

RabbitMQ Multi-Tenant Producer

MongoDB Multi-Tenant Repository

Event-Driven Tenant Discovery

RabbitMQ Consumer Goroutine Startup

HTTP Client Transport

Conditional Initialization

Testing Multi-Tenant Code

Unit Tests with Mock Tenant Context

Testing Tenant Isolation

Integration Tests with Tenant Isolation

Testing Error Cases

Testing RabbitMQ Multi-Tenant Consumer

Testing Redis Key Prefixing

Error Handling

Tenant Isolation Verification (⚠️ CONDITIONAL)

Isolation Architecture

Why Tenant Isolation Verification Is MANDATORY

Detection Commands (MANDATORY)

Anti-Rationalization Table

Context Functions

Tenant ID Validation

Redis Key Prefixing for Lua Scripts

Multi-Tenant Metrics

Responsibility Split: lib-commons vs Service

Anti-Rationalization Table (General)

Final Step: Single-Tenant Backward Compatibility Validation (MANDATORY)

How the Backward Compatibility Mechanism Works

Validation Steps (execute in order)

Anti-Rationalization

Checklist

Route-Level Auth-Before-Tenant Ordering (MANDATORY)

Why This Matters

The WRONG Pattern (Anti-Pattern)

The CORRECT Pattern: WhenEnabled

Detection Commands (MANDATORY)

Anti-Rationalization Table

Multi-Tenant Message Queue Consumers

When to Use

Problem: Startup Time Scales with Tenant Count

Solution: On-Demand Consumer Initialization

Architecture Components

1. Tenant Discovery Service

2. Consumer Manager (lib-commons)

3. Service Bootstrap Trigger

Implementation Steps

Step 1: Update Shared Library

Step 2: Add Tenant Discovery Endpoint

Step 3: Wire Consumer in Service Bootstrap

Failure Resilience Pattern

Observability Pattern

Testing Strategy

When to Use Multi-Tenant Consumer

Cache Invalidation (MANDATORY)

116 KiB

Raw Blame History