LocalAI/core/services
Ettore Di Giacinto 83b384de97
feat: surface distributed backend management errors (#9552)
* fix(distributed): surface per-node backend op errors to OpStatus

DistributedBackendManager.{Install,Upgrade,Delete}Backend discarded the
per-node BackendOpResult from enqueueAndDrainBackendOp with `_, err :=`.
When workers replied Success=false (e.g. an OCI image with no arm64
variant on a Jetson host), the per-node Error string was recorded in
result.Nodes[].Error but never reached the toplevel return value, so
OpStatus.Error stayed empty and the UI reported the install as
"completed" while the backend was nowhere on the cluster.

Add BackendOpResult.Err() that aggregates per-node Status=="error"
entries into a single error. Queued nodes (waiting for reconciler retry)
are deliberately not treated as failures. Wire the three callers and
DeleteBackendDetailed to call result.Err() so reply.Success=false
finally reaches OpStatus.Error → /api/backends/job/:uid → the UI.

The Delete closures had a related bug: they discarded the reply with
`_` and only checked the NATS round-trip error, so reply.Success=false
was a silent success even with the new aggregation. Check both.

Standalone mode (LocalBackendManager) already surfaces gallery errors
correctly through the same OpStatus.Error path; no change needed there.

Tests: 9 new Ginkgo specs covering all-success / all-fail with distinct
errors / mixed / all-queued / no-nodes for Install, Upgrade, Delete.

Assisted-by: Claude:claude-opus-4-7 [Bash] [Edit] [Read] [Write]

* feat(react-ui): per-node backend delete + clearer upgrade affordance

The Nodes page exposed a per-node "reinstall" button (fa-sync-alt,
tooltip "Reinstall backend") but no per-node delete, even though the
Go side has had POST /api/nodes/:id/backends/delete →
RemoteUnloaderAdapter.DeleteBackend → NATS-to-specific-node wired up
for a while. Sync icons read as "refresh data" — the action is
functionally an upgrade (re-pulls the gallery image), so the affordance
was misleading.

Per-node backend row now renders two icon buttons:

- Upgrade: btn-secondary btn-sm + fa-arrow-up, tooltip "Upgrade backend
  on this node". Names both action and scope to differentiate from the
  cluster-wide upgrade on the Backends page.
- Delete: btn-danger-ghost btn-sm + fa-trash, tooltip "Delete backend
  from this node". Matches the node-level destructive style at the row
  action column rather than the solid btn-danger of primary destructive
  pages, since this is a secondary action inside a busy row.

Delete goes through the existing ConfirmDialog (danger=true) with copy
that names the backend and the node explicitly — it's a non-recoverable
op on a specific scope. Reuses nodesApi.deleteBackend(id, backend) which
already existed in the API client.

Tests: 4 new Playwright specs covering upgrade clarity (icon + tooltip),
delete button presence, confirm dialog flow with POST body assertion,
and cancel-doesn't-POST.

Assisted-by: Claude:claude-opus-4-7 [Bash] [Edit] [Read] [Write]
2026-04-25 08:57:59 +02:00
..
advisorylock feat(distributed): sync state with frontends, better backend management reporting (#9426) 2026-04-19 17:55:53 +02:00
agentpool feat: add distributed mode (#9124) 2026-03-30 00:47:27 +02:00
agents feat: add distributed mode (#9124) 2026-03-30 00:47:27 +02:00
dbutil feat: add distributed mode (#9124) 2026-03-30 00:47:27 +02:00
distributed feat: add distributed mode (#9124) 2026-03-30 00:47:27 +02:00
facerecognition feat(face-recognition): add insightface/onnx backend for 1:1 verify, 1:N identify, embedding, detection, analysis (#9480) 2026-04-22 21:55:41 +02:00
finetune feat: add distributed mode (#9124) 2026-03-30 00:47:27 +02:00
galleryop feat: add biometrics UI (#9524) 2026-04-24 08:50:34 +02:00
jobs feat: add distributed mode (#9124) 2026-03-30 00:47:27 +02:00
mcp feat: add distributed mode (#9124) 2026-03-30 00:47:27 +02:00
messaging fix(distributed): pass ExternalURI through NATS backend install (#9446) 2026-04-20 23:39:35 +02:00
monitoring feat: add distributed mode (#9124) 2026-03-30 00:47:27 +02:00
nodes feat: surface distributed backend management errors (#9552) 2026-04-25 08:57:59 +02:00
quantization feat: add distributed mode (#9124) 2026-03-30 00:47:27 +02:00
skills feat: add distributed mode (#9124) 2026-03-30 00:47:27 +02:00
storage feat: track files being staged (#9275) 2026-04-08 14:33:58 +02:00
testutil feat: add distributed mode (#9124) 2026-03-30 00:47:27 +02:00
voicerecognition feat: voice recognition (#9500) 2026-04-23 12:07:14 +02:00