LocalAI/core/services/nodes
LocalAI [bot] e5d7b84216
fix(distributed): split NATS backend.upgrade off install + dedup loads (#9717)
* feat(messaging): add backend.upgrade NATS subject + payload types

Splits the slow force-reinstall path off backend.install so it can run on
its own subscription goroutine, eliminating head-of-line blocking between
routine model loads and full gallery upgrades.

Wire-level Force flag on BackendInstallRequest is kept for one release as
the rolling-update fallback target; doc note marks it deprecated.

Assisted-by: Claude:claude-sonnet-4-6
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(distributed/worker): add per-backend mutex helper to backendSupervisor

Different backend names lock independently; same backend serializes. This
is the synchronization primitive used by the upcoming concurrent install
handler — without it, wrapping the NATS callback in a goroutine would
race the gallery directory when two requests target the same backend.

Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(distributed/worker): run backend.install handler in a goroutine

NATS subscriptions deliver messages serially on a single per-subscription
goroutine. With a synchronous install handler, a multi-minute gallery
download would head-of-line-block every other install request to the
same worker — manifesting upstream as a 5-minute "nats: timeout" on
unrelated routine model loads.

The body now runs in its own goroutine, with a per-backend mutex
(lockBackend) protecting the gallery directory from concurrent operations
on the same backend. Different backend names install in parallel.

Backward-compat: req.Force=true is still honored here, so an older master
that hasn't been updated to send on backend.upgrade keeps working.

Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(distributed/worker): subscribe to backend.upgrade as a separate path

Slow force-reinstall now lives on its own NATS subscription, so a
multi-minute gallery pull cannot head-of-line-block the routine
backend.install handler on the same worker. Same per-backend mutex
guards both — concurrent install + upgrade for the same backend
serialize at the gallery directory; different backends are independent.

upgradeBackend stops every live process for the backend, force-installs
from gallery, and re-registers. It does not start a new process — the
next backend.install will spawn one with the freshly-pulled binary.

Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(distributed): add UpgradeBackend on NodeCommandSender; drop Force from InstallBackend

Master now sends to backend.upgrade for force-reinstall, with a
nats.ErrNoResponders fallback to the legacy backend.install Force=true
path so a rolling update with a new master + an old worker still
converges. The Force parameter leaves the public Go API surface
entirely — only the internal fallback sets it on the wire.

InstallBackend timeout drops 5min -> 3min (most replies are sub-second
since the worker short-circuits on already-running or already-installed).
UpgradeBackend timeout is 15min, sized for real-world Jetson-on-WiFi
gallery pulls.

Updates the admin install HTTP endpoint
(core/http/endpoints/localai/nodes.go) to the new signature too.

router_test.go's fakeUnloader does not yet implement the new interface
shape; Task 3.2 will catch it up before the next package-level test run.

Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* test(distributed): update fakeUnloader for new NodeCommandSender shape

InstallBackend lost its force bool param (Force is not part of the public
Go API anymore — only the internal upgrade-fallback path sets it on the
wire). UpgradeBackend gained a method. Fake records both call slices and
provides an installHook concurrency seam for upcoming singleflight tests.

Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* test(distributed): cover UpgradeBackend's new subject + rolling-update fallback

Task 3.1 changed the master to publish UpgradeBackend on the new
backend.upgrade subject; the existing UpgradeBackend tests scripted the
old install subject and so all 3 began failing as expected. Updates them
to script SubjectNodeBackendUpgrade with BackendUpgradeReply.

Adds two new specs for the rolling-update fallback:
  - ErrNoResponders on backend.upgrade triggers a backend.install
    Force=true retry on the same node.
  - Non-NoResponders errors propagate to the caller unchanged.

scriptedMessagingClient gains scriptNoResponders (real nats sentinel) and
scriptReplyMatching (predicate-matched canned reply, used to assert that
the fallback path actually sets Force=true on the install retry).

Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(distributed): coalesce concurrent identical backend.install via singleflight

Six simultaneous chat completions for the same not-yet-loaded model were
observed firing six independent NATS install requests, each serializing
through the worker's per-subscription goroutine and amplifying queue
depth. SmartRouter now wraps the NATS round-trip in a singleflight.Group
keyed by (nodeID, backend, modelID, replica): N concurrent identical
loads share one round-trip and one reply.

Distinct (modelID, replica) keys still fire independent calls, so
multi-replica scaling and multi-model fan-out are unaffected.

fakeUnloader gains a sync.Mutex around its recording slices to keep
concurrent test goroutines race-clean.

Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* test(e2e/distributed): drop force arg from InstallBackend test calls

Two e2e test call sites still passed the trailing force bool that was
removed from RemoteUnloaderAdapter.InstallBackend in 9bde76d7. Caught
by golangci-lint typecheck on the upgrade-split branch (master CI was
already green because these tests don't run in the standard test path).

Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* refactor(distributed): extract worker business logic to core/services/worker

core/cli/worker.go grew to 1212 lines after the backend.upgrade split.
The CLI package was carrying backendSupervisor, NATS lifecycle handlers,
gallery install/upgrade orchestration, S3 file staging, and registration
helpers — all distributed-worker business logic that doesn't belong in
the cobra surface.

Move it to a new core/services/worker package, mirroring the existing
core/services/{nodes,messaging,galleryop} pattern. core/cli/worker.go
shrinks to ~19 lines: a kong-tagged shim that embeds worker.Config and
delegates Run.

No behavior change. All symbols stay unexported except Config and Run.
The three worker-specific tests (addr/replica/concurrency) move with
the code via git mv so history follows them.

Files split as:
  worker.go        - Run entry point
  config.go        - Config struct (kong tags retained, kong not imported)
  supervisor.go    - backendProcess, backendSupervisor, process lifecycle
  install.go       - installBackend, upgradeBackend, findBackend, lockBackend
  lifecycle.go     - subscribeLifecycleEvents (verbatim, decomposition is
                     a follow-up commit)
  file_staging.go  - subscribeFileStaging, isPathAllowed
  registration.go  - advertiseAddr, registrationBody, heartbeatBody, etc.
  reply.go         - replyJSON
  process_helpers.go - readLastLinesFromFile

Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* refactor(distributed/worker): decompose subscribeLifecycleEvents into per-event handlers

The 226-line subscribeLifecycleEvents method packed eight NATS subscriptions
inline. Each grew context-shaped doc comments mixed with subscription
plumbing, making it hard to read any one handler without scrolling past the
others. Extract each handler into its own method on *backendSupervisor; the
subscriber becomes a thin 8-line dispatcher.

No behavior change: each method body is byte-equivalent to its corresponding
inline goroutine + handler. Doc comments that were attached to the inline
SubscribeReply calls migrate to the new method godocs.

Adding the next NATS subject is now a 2-line patch to the dispatcher plus
one new method, instead of grafting onto a monolith.

Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-05-08 16:24:54 +02:00
..
distributed_store.go feat: add distributed mode (#9124) 2026-03-30 00:47:27 +02:00
distributed_store_test.go feat: add distributed mode (#9124) 2026-03-30 00:47:27 +02:00
file_stager.go feat: add distributed mode (#9124) 2026-03-30 00:47:27 +02:00
file_stager_http.go feat: track files being staged (#9275) 2026-04-08 14:33:58 +02:00
file_stager_s3.go feat: track files being staged (#9275) 2026-04-08 14:33:58 +02:00
file_staging_client.go feat: wire transcription for llama.cpp, add streaming support (#9353) 2026-04-14 16:13:40 +02:00
file_transfer_server.go fix(distributed): worker container healthcheck always unhealthy 2026-04-27 13:51:57 +00:00
file_transfer_server_test.go feat(distributed): Avoid resending models to backend nodes (#9193) 2026-03-31 16:28:13 +02:00
health.go fix(distributed): orchestrator resilience — auto-upgrade routing, worker bind-wait, RAG-init crash, log spam (#9657) 2026-05-04 19:09:16 +02:00
health_mock_test.go feat(api): add /v1/audio/diarization endpoint with sherpa-onnx + vibevoice.cpp (#9654) 2026-05-05 15:10:13 +02:00
health_test.go feat(distributed): support multiple replicas of one model on the same node (#9583) 2026-04-27 21:20:05 +02:00
inflight.go feat(distributed): support multiple replicas of one model on the same node (#9583) 2026-04-27 21:20:05 +02:00
inflight_test.go feat(api): add /v1/audio/diarization endpoint with sherpa-onnx + vibevoice.cpp (#9654) 2026-05-05 15:10:13 +02:00
interfaces.go feat(concurrency-groups): per-model exclusive groups for backend loading (#9662) 2026-05-05 08:42:50 +02:00
managers_distributed.go fix(distributed): split NATS backend.upgrade off install + dedup loads (#9717) 2026-05-08 16:24:54 +02:00
managers_distributed_test.go fix(distributed): split NATS backend.upgrade off install + dedup loads (#9717) 2026-05-08 16:24:54 +02:00
model_router.go feat: add distributed mode (#9124) 2026-03-30 00:47:27 +02:00
model_router_test.go feat(concurrency-groups): per-model exclusive groups for backend loading (#9662) 2026-05-05 08:42:50 +02:00
nodes_suite_test.go feat: add distributed mode (#9124) 2026-03-30 00:47:27 +02:00
reconciler.go fix(distributed): split NATS backend.upgrade off install + dedup loads (#9717) 2026-05-08 16:24:54 +02:00
reconciler_test.go feat(distributed): support multiple replicas of one model on the same node (#9583) 2026-04-27 21:20:05 +02:00
registry.go fix(distributed): round-robin replicas of the same model (#9695) 2026-05-06 19:40:54 +02:00
registry_test.go fix(distributed): round-robin replicas of the same model (#9695) 2026-05-06 19:40:54 +02:00
router.go fix(distributed): split NATS backend.upgrade off install + dedup loads (#9717) 2026-05-08 16:24:54 +02:00
router_test.go fix(distributed): split NATS backend.upgrade off install + dedup loads (#9717) 2026-05-08 16:24:54 +02:00
staging_keys.go feat: add distributed mode (#9124) 2026-03-30 00:47:27 +02:00
staging_keys_test.go feat: add distributed mode (#9124) 2026-03-30 00:47:27 +02:00
staging_progress.go feat: track files being staged (#9275) 2026-04-08 14:33:58 +02:00
unloader.go fix(distributed): split NATS backend.upgrade off install + dedup loads (#9717) 2026-05-08 16:24:54 +02:00
unloader_test.go feat(distributed): support multiple replicas of one model on the same node (#9583) 2026-04-27 21:20:05 +02:00
unloader_upgrade_test.go fix(distributed): split NATS backend.upgrade off install + dedup loads (#9717) 2026-05-08 16:24:54 +02:00