LocalAI/core/cli
Ettore Di Giacinto 3948b580d2 fix(distributed): worker stopBackend/isRunning resolve bare modelID to replica keys
PR #9583 changed the supervisor's process map key from `modelID` to
`modelID#replicaIndex`, but the NATS lifecycle handlers kept passing
the bare modelID:

* `backend.stop` (subscribeLifecycleEvents): `s.stopBackend(req.Backend)`
  → `s.processes["Qwen3.6-..."]` missed (actual key is "...#0") →
  silent no-op. Admin "Unload model" clicks released VRAM via
  model.unload but left the gRPC process alive on its old port.
  Subsequent chats hit installBackend, found the leftover process,
  reused its address — and the UI reported "no models loaded" while
  the model kept responding.

* `backend.delete` (subscribeLifecycleEvents): same map miss in
  `isRunning(req.Backend)` and `s.stopBackend(req.Backend)` — admin
  "Delete backend" deleted the binary while the process was still
  serving traffic.

Add `resolveProcessKeys(id)`: exact match if `id` is a full processKey
(stopAllBackends iterates the map and passes its own keys);
prefix-match if `id` is bare (NATS handlers); empty if `id` contains
`#` but doesn't match (no spurious fallback when the caller was
explicit). stopBackend and isRunning now call it; stopBackend gets a
new stopBackendExact helper for per-key cleanup.

TDD: regression test fails without the fix (resolveProcessKeys
doesn't exist; map lookup by bare name returns nothing). Tests pass
post-fix.

Reproduced live: registry row count was 0 for the model the user
"Unloaded", chat still served by the leftover worker process.
SmartRouter behavior is correct in itself — it falls through to
scheduleAndLoad when no row exists; the bug was that the leftover
process corrupted the install path.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: claude-code:opus-4-7 [Edit] [Bash]
2026-04-27 21:43:15 +00:00
..
context feat: Merge repeated log lines in the terminal (#9141) 2026-03-26 22:16:13 +01:00
worker feat: improve CLI error messages with actionable guidance (#8880) 2026-04-21 11:53:26 +02:00
workerregistry feat: add distributed mode (#9124) 2026-03-30 00:47:27 +02:00
agent.go feat: add distributed mode (#9124) 2026-03-30 00:47:27 +02:00
agent_test.go feat: add node reconciler, allow to schedule to group of nodes, min/max autoscaler (#9186) 2026-03-31 08:28:56 +02:00
agent_worker.go feat: add distributed mode (#9124) 2026-03-30 00:47:27 +02:00
backends.go feat: backend versioning, upgrade detection and auto-upgrade (#9315) 2026-04-11 22:31:15 +02:00
cli.go feat: add distributed mode (#9124) 2026-03-30 00:47:27 +02:00
cli_suite_test.go feat: add node reconciler, allow to schedule to group of nodes, min/max autoscaler (#9186) 2026-03-31 08:28:56 +02:00
completion.go feat: add distributed mode (#9124) 2026-03-30 00:47:27 +02:00
completion_test.go feat: add node reconciler, allow to schedule to group of nodes, min/max autoscaler (#9186) 2026-03-31 08:28:56 +02:00
deprecations.go chore: Standardize CLI flag naming to kebab-case (M12) (#8912) 2026-03-09 22:15:39 +01:00
explorer.go chore(refactor): move logging to common package based on slog (#7668) 2025-12-21 19:33:13 +01:00
federated.go chore: Standardize CLI flag naming to kebab-case (M12) (#8912) 2026-03-09 22:15:39 +01:00
models.go feat: add distributed mode (#9124) 2026-03-30 00:47:27 +02:00
run.go feat: improve CLI error messages with actionable guidance (#8880) 2026-04-21 11:53:26 +02:00
soundgeneration.go fix: Remove debug print statement from soundgeneration.go (C2) (#8843) 2026-03-08 08:49:29 +01:00
transcript.go feat: improve CLI error messages with actionable guidance (#8880) 2026-04-21 11:53:26 +02:00
tts.go chore(refactor): move logging to common package based on slog (#7668) 2025-12-21 19:33:13 +01:00
util.go feat: improve CLI error messages with actionable guidance (#8880) 2026-04-21 11:53:26 +02:00
worker.go fix(distributed): worker stopBackend/isRunning resolve bare modelID to replica keys 2026-04-27 21:43:15 +00:00
worker_addr_test.go feat: add node reconciler, allow to schedule to group of nodes, min/max autoscaler (#9186) 2026-03-31 08:28:56 +02:00
worker_replica_test.go fix(distributed): worker stopBackend/isRunning resolve bare modelID to replica keys 2026-04-27 21:43:15 +00:00