LocalAI/core/services/nodes
Ettore Di Giacinto 75fba9e03f
fix(distributed): scope Upgrade All to nodes that have the backend installed (#9678)
In distributed mode the React UI's "Upgrade All" button fanned every
detected outdated backend out to every healthy backend node, including
nodes that never had that backend installed. On heterogeneous clusters
this surfaced as platform errors (e.g. mac-mini-m4 asked to upgrade
cpu-insightface-development, which has no darwin/arm64 variant) and left
forever-retrying pending_backend_ops rows.

DistributedBackendManager.UpgradeBackend now queries ListBackends()
first, builds the target node-ID set from SystemBackend.Nodes, and only
fans out to those nodes — every per-node primitive
(adapter.InstallBackend, the pending-ops queue, BackendOpResult) is
unchanged. enqueueAndDrainBackendOp gains an optional targetNodeIDs
allowlist; Install/Delete keep their fan-to-everyone semantics by
passing nil. If no node reports the backend installed, UpgradeBackend
now returns a clear "not installed on any node" error instead of
producing a stuck queue.

Adds Ginkgo coverage for the smart fan-out: backend on a subset of
nodes goes only to those nodes; backend on no node returns the new
error and never sends a NATS install request.


Assisted-by: Claude:claude-opus-4-7 [Claude Code]

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-05-06 00:28:41 +02:00
..
distributed_store.go feat: add distributed mode (#9124) 2026-03-30 00:47:27 +02:00
distributed_store_test.go feat: add distributed mode (#9124) 2026-03-30 00:47:27 +02:00
file_stager.go feat: add distributed mode (#9124) 2026-03-30 00:47:27 +02:00
file_stager_http.go feat: track files being staged (#9275) 2026-04-08 14:33:58 +02:00
file_stager_s3.go feat: track files being staged (#9275) 2026-04-08 14:33:58 +02:00
file_staging_client.go feat: wire transcription for llama.cpp, add streaming support (#9353) 2026-04-14 16:13:40 +02:00
file_transfer_server.go fix(distributed): worker container healthcheck always unhealthy 2026-04-27 13:51:57 +00:00
file_transfer_server_test.go feat(distributed): Avoid resending models to backend nodes (#9193) 2026-03-31 16:28:13 +02:00
health.go fix(distributed): orchestrator resilience — auto-upgrade routing, worker bind-wait, RAG-init crash, log spam (#9657) 2026-05-04 19:09:16 +02:00
health_mock_test.go feat(api): add /v1/audio/diarization endpoint with sherpa-onnx + vibevoice.cpp (#9654) 2026-05-05 15:10:13 +02:00
health_test.go feat(distributed): support multiple replicas of one model on the same node (#9583) 2026-04-27 21:20:05 +02:00
inflight.go feat(distributed): support multiple replicas of one model on the same node (#9583) 2026-04-27 21:20:05 +02:00
inflight_test.go feat(api): add /v1/audio/diarization endpoint with sherpa-onnx + vibevoice.cpp (#9654) 2026-05-05 15:10:13 +02:00
interfaces.go feat(concurrency-groups): per-model exclusive groups for backend loading (#9662) 2026-05-05 08:42:50 +02:00
managers_distributed.go fix(distributed): scope Upgrade All to nodes that have the backend installed (#9678) 2026-05-06 00:28:41 +02:00
managers_distributed_test.go fix(distributed): scope Upgrade All to nodes that have the backend installed (#9678) 2026-05-06 00:28:41 +02:00
model_router.go feat: add distributed mode (#9124) 2026-03-30 00:47:27 +02:00
model_router_test.go feat(concurrency-groups): per-model exclusive groups for backend loading (#9662) 2026-05-05 08:42:50 +02:00
nodes_suite_test.go feat: add distributed mode (#9124) 2026-03-30 00:47:27 +02:00
reconciler.go feat(distributed): support multiple replicas of one model on the same node (#9583) 2026-04-27 21:20:05 +02:00
reconciler_test.go feat(distributed): support multiple replicas of one model on the same node (#9583) 2026-04-27 21:20:05 +02:00
registry.go fix(distributed): honor NodeSelector in cached-replica lookup, stop empty-backend reconciler scaleups (#9652) 2026-05-04 09:42:14 +02:00
registry_test.go fix(distributed): honor NodeSelector in cached-replica lookup, stop empty-backend reconciler scaleups (#9652) 2026-05-04 09:42:14 +02:00
router.go feat(concurrency-groups): per-model exclusive groups for backend loading (#9662) 2026-05-05 08:42:50 +02:00
router_test.go feat(concurrency-groups): per-model exclusive groups for backend loading (#9662) 2026-05-05 08:42:50 +02:00
staging_keys.go feat: add distributed mode (#9124) 2026-03-30 00:47:27 +02:00
staging_keys_test.go feat: add distributed mode (#9124) 2026-03-30 00:47:27 +02:00
staging_progress.go feat: track files being staged (#9275) 2026-04-08 14:33:58 +02:00
unloader.go feat(distributed): support multiple replicas of one model on the same node (#9583) 2026-04-27 21:20:05 +02:00
unloader_test.go feat(distributed): support multiple replicas of one model on the same node (#9583) 2026-04-27 21:20:05 +02:00