LocalAI

mirror of https://github.com/mudler/LocalAI synced 2026-05-24 09:28:23 +00:00

History

LocalAI [bot] b4fdb41dcc fix(distributed): cascade-clean stale node_models rows + filter routing by healthy status (#9754 ) * fix(distributed): cascade-clean stale node_models on drain and filter routing by healthy status Stale node_models rows (state="loaded") were surviving past the healthy state of their owning node, causing /embeddings (and other inference paths) to dispatch to a backend whose process was gone or drained. The downstream symptom in a live cluster was pgvector rejecting inserts with "vector cannot have more than 16000 dimensions (SQLSTATE 54000)" because the misbehaving backend silently returned a malformed (oversized) tensor; the Models page showed the model as "running" without an associated node, like a stale entry, even though the node was no longer visible in the Nodes view. Two changes here, plus a third in a follow-up commit: - MarkDraining now cascade-deletes node_models rows for the affected node, mirroring MarkOffline. Drains are explicit operator actions — the box has been intentionally taken out of rotation — so clearing the rows stops the Models UI from misreporting and prevents the routing layer from picking those rows if scheduling logic is ever relaxed. In-flight requests already hold their gRPC client through Route() and finish normally; the only observable effect is a non-fatal IncrementInFlight warning, acceptable for a drain. MarkUnhealthy is deliberately left status-only: it fires from managers_distributed / reconciler on a single nats.ErrNoResponders with no retry, so a transient NATS hiccup must not nuke every loaded model and force a full reload on recovery. - FindAndLockNodeWithModel's inner JOIN now filters on backend_nodes.status = healthy in addition to node_models.state = loaded. The previous version relied on the second node-fetch step to reject non-healthy nodes, but a concurrent reader could still pick the same stale row in the same window. Belt-and-braces. - DistributedConfig.PerModelHealthCheck renamed to DisablePerModelHealthCheck and inverted at the call site so per-model gRPC probing is on by default. The probe (now made consecutive-miss aware in a follow-up commit) independently health- checks each model's gRPC address and removes stale node_models rows when the backend has crashed even though the worker's node-level heartbeat is still arriving. Migration: the field had no CLI flag, env var binding, or YAML key in tree (only the bare struct field), so there is no user-facing migration. Anything constructing DistributedConfig in code needs to drop the assignment (default now does the right thing) or invert it. Assisted-by: Claude:claude-opus-4-7 go-vet go-test golangci-lint Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(distributed): require consecutive misses before per-model probe removes a row The per-model gRPC probe used to remove a node_models row on a single failed health check. With the per-model probe now on by default, that made any 5-second gRPC blip (network jitter, a long-running request hogging the worker's gRPC server thread, brief GC pause) trigger a full reload of the affected model — too eager for production. Require perModelMissThreshold (3) consecutive failed probes before removal. At the default 15s tick a model must be unreachable for ~45s before reap; a single successful probe in between resets the streak. Per-(node, model, replica) state tracked under a mutex on the monitor. If the removal call itself fails, the miss counter is left in place so the next tick retries rather than starting the streak over. Tests: - removes stale model via per-model health check after consecutive failures (replaces the single-shot expectation) - preserves model row when an intermittent failure is followed by a success (covers the reset-on-success path and verifies the counter reset by failing twice more without crossing threshold) - newTestHealthMonitor initializes the misses map so direct-construct test helpers don't nil-map-panic in the probe path Assisted-by: Claude:claude-opus-4-7 go-vet go-test golangci-lint Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>		2026-05-13 21:57:50 +02:00
..
distributed_store.go	feat: add distributed mode (#9124 )	2026-03-30 00:47:27 +02:00
distributed_store_test.go	feat: add distributed mode (#9124 )	2026-03-30 00:47:27 +02:00
file_stager.go	feat: add distributed mode (#9124 )	2026-03-30 00:47:27 +02:00
file_stager_http.go	feat: track files being staged (#9275 )	2026-04-08 14:33:58 +02:00
file_stager_s3.go	feat: track files being staged (#9275 )	2026-04-08 14:33:58 +02:00
file_staging_client.go	feat: wire transcription for llama.cpp, add streaming support (#9353 )	2026-04-14 16:13:40 +02:00
file_transfer_server.go	fix(distributed): worker container healthcheck always unhealthy	2026-04-27 13:51:57 +00:00
file_transfer_server_test.go	feat(distributed): Avoid resending models to backend nodes (#9193 )	2026-03-31 16:28:13 +02:00
health.go	fix(distributed): cascade-clean stale node_models rows + filter routing by healthy status (#9754 )	2026-05-13 21:57:50 +02:00
health_mock_test.go	fix(distributed): cascade-clean stale node_models rows + filter routing by healthy status (#9754 )	2026-05-13 21:57:50 +02:00
health_test.go	fix(distributed): cascade-clean stale node_models rows + filter routing by healthy status (#9754 )	2026-05-13 21:57:50 +02:00
inflight.go	feat(distributed): support multiple replicas of one model on the same node (#9583 )	2026-04-27 21:20:05 +02:00
inflight_test.go	feat(realtime): Add Liquid Audio s2s model and assistant mode on talk page (#9801 )	2026-05-13 21:57:27 +02:00
interfaces.go	feat(concurrency-groups): per-model exclusive groups for backend loading (#9662 )	2026-05-05 08:42:50 +02:00
managers_distributed.go	fix(distributed): split NATS backend.upgrade off install + dedup loads (#9717 )	2026-05-08 16:24:54 +02:00
managers_distributed_test.go	fix(distributed): split NATS backend.upgrade off install + dedup loads (#9717 )	2026-05-08 16:24:54 +02:00
model_router.go	feat: add distributed mode (#9124 )	2026-03-30 00:47:27 +02:00
model_router_test.go	feat(concurrency-groups): per-model exclusive groups for backend loading (#9662 )	2026-05-05 08:42:50 +02:00
nodes_suite_test.go	feat: add distributed mode (#9124 )	2026-03-30 00:47:27 +02:00
reconciler.go	fix(distributed): split NATS backend.upgrade off install + dedup loads (#9717 )	2026-05-08 16:24:54 +02:00
reconciler_test.go	feat(distributed): support multiple replicas of one model on the same node (#9583 )	2026-04-27 21:20:05 +02:00
registry.go	fix(distributed): cascade-clean stale node_models rows + filter routing by healthy status (#9754 )	2026-05-13 21:57:50 +02:00
registry_test.go	fix(distributed): round-robin replicas of the same model (#9695 )	2026-05-06 19:40:54 +02:00
router.go	fix(distributed): split NATS backend.upgrade off install + dedup loads (#9717 )	2026-05-08 16:24:54 +02:00
router_test.go	fix(distributed): split NATS backend.upgrade off install + dedup loads (#9717 )	2026-05-08 16:24:54 +02:00
staging_keys.go	feat: add distributed mode (#9124 )	2026-03-30 00:47:27 +02:00
staging_keys_test.go	feat: add distributed mode (#9124 )	2026-03-30 00:47:27 +02:00
staging_progress.go	feat: track files being staged (#9275 )	2026-04-08 14:33:58 +02:00
unloader.go	fix(distributed): split NATS backend.upgrade off install + dedup loads (#9717 )	2026-05-08 16:24:54 +02:00
unloader_test.go	feat(distributed): support multiple replicas of one model on the same node (#9583 )	2026-04-27 21:20:05 +02:00
unloader_upgrade_test.go	fix(distributed): split NATS backend.upgrade off install + dedup loads (#9717 )	2026-05-08 16:24:54 +02:00