LocalAI

mirror of https://github.com/mudler/LocalAI synced 2026-04-21 13:27:21 +00:00

Author	SHA1	Message	Date
Russell Sim	02bb715c0a	fix(distributed): pass ExternalURI through NATS backend install (#9446 ) When installing a backend with a custom OCI URI in distributed mode, the URI was captured in ManagementOp.ExternalURI by the HTTP handler but never forwarded to workers. BackendInstallRequest had no URI field, so workers fell through to the gallery lookup and failed with "no backend found with name <custom-name>". Add URI/Name/Alias fields to BackendInstallRequest and thread them from ManagementOp through DistributedBackendManager.InstallBackend() and the RemoteUnloaderAdapter. On the worker side, route to InstallExternalBackend when URI is set instead of InstallBackendFromGallery. Update all remaining InstallBackend call sites (UpgradeBackend, reconciler pending-op drain, router auto-install) to pass empty strings for the new params. Assisted-by: Claude Code:claude-sonnet-4-6 Signed-off-by: Russell Sim <rsl@simopolis.xyz>	2026-04-20 23:39:35 +02:00
pjbrzozowski	ecf85fde9e	fix(api): remove duplicate /api/traces endpoint that broke React UI (#9427 ) Some checks are pending build container images / core-image-build (ubuntu:24.04, cublas, 12, 8, --jobs=4 --output-sync=target, linux/amd64, ubuntu-latest, false, auto, -gpu-nvidia-cuda-12, noble, 2404) (push) Waiting to run Details build container images / core-image-build (ubuntu:24.04, vulkan, --jobs=4 --output-sync=target, linux/amd64,linux/arm64, ubuntu-latest, false, auto, -gpu-vulkan, noble, 2404) (push) Waiting to run Details build container images / gh-runner (nvcr.io/nvidia/l4t-jetpack:r36.4.0, cublas, 12, 0, --jobs=4 --output-sync=target, linux/arm64, ubuntu-24.04-arm, true, auto, -nvidia-l4t-arm64, jammy, 2204) (push) Waiting to run Details build container images / gh-runner (ubuntu:24.04, cublas, 13, 0, --jobs=4 --output-sync=target, linux/arm64, ubuntu-24.04-arm, false, auto, -nvidia-l4t-arm64-cuda-13, noble, 2404) (push) Waiting to run Details Tests extras backends / tests-llama-cpp-quantization (push) Blocked by required conditions Details Tests extras backends / tests-llama-cpp-grpc (push) Blocked by required conditions Details Tests extras backends / tests-llama-cpp-grpc-transcription (push) Blocked by required conditions Details Tests extras backends / tests-ik-llama-cpp-grpc (push) Blocked by required conditions Details Tests extras backends / tests-turboquant-grpc (push) Blocked by required conditions Details Tests extras backends / tests-acestep-cpp (push) Blocked by required conditions Details Tests extras backends / tests-qwen3-tts-cpp (push) Blocked by required conditions Details Tests extras backends / tests-voxtral (push) Blocked by required conditions Details Tests extras backends / tests-kokoros (push) Blocked by required conditions Details tests / tests-linux (1.26.x) (push) Waiting to run Details tests / tests-e2e-container (push) Waiting to run Details tests / tests-apple (1.26.x) (push) Waiting to run Details E2E Backend Tests / tests-e2e-backend (1.25.x) (push) Waiting to run Details UI E2E Tests / tests-ui-e2e (1.26.x) (push) Waiting to run Details Security Scan / tests (push) Waiting to run Details Tests extras backends / detect-changes (push) Waiting to run Details Tests extras backends / tests-transformers (push) Blocked by required conditions Details Tests extras backends / tests-rerankers (push) Blocked by required conditions Details Tests extras backends / tests-diffusers (push) Blocked by required conditions Details Tests extras backends / tests-coqui (push) Blocked by required conditions Details Tests extras backends / tests-moonshine (push) Blocked by required conditions Details Tests extras backends / tests-pocket-tts (push) Blocked by required conditions Details Tests extras backends / tests-qwen-tts (push) Blocked by required conditions Details Tests extras backends / tests-qwen-asr (push) Blocked by required conditions Details Tests extras backends / tests-nemo (push) Blocked by required conditions Details Tests extras backends / tests-voxcpm (push) Blocked by required conditions Details The API Traces tab in /app/traces always showed (0) traces despite requests being recorded. The /api/traces endpoint was registered in both localai.go and ui_api.go. The ui_api.go version wrapped the response as {"traces": [...]} instead of the flat []APIExchange array that both the React UI (Traces.jsx) and the legacy Alpine.js UI (traces.html) expect. Because Echo matched the ui_api.go handler, Array.isArray(apiData) always returned false, making the API Traces tab permanently empty. Remove the duplicate endpoints from ui_api.go so only the correct flat-array version in localai.go is served. Also use mime.ParseMediaType for the Content-Type check in the trace middleware so requests with parameters (e.g. application/json; charset=utf-8) are still traced. Signed-off-by: Pawel Brzozowski <paul@ontux.net> Co-authored-by: Pawel Brzozowski <paul@ontux.net>	2026-04-20 18:44:49 +02:00
Sai Asish Y	6480715a16	fix(settings): strip env-supplied ApiKeys from the request before persisting (#9438 ) Some checks are pending build container images / core-image-build (ubuntu:24.04, cublas, 12, 8, --jobs=4 --output-sync=target, linux/amd64, ubuntu-latest, false, auto, -gpu-nvidia-cuda-12, noble, 2404) (push) Waiting to run Details build container images / core-image-build (ubuntu:24.04, vulkan, --jobs=4 --output-sync=target, linux/amd64,linux/arm64, ubuntu-latest, false, auto, -gpu-vulkan, noble, 2404) (push) Waiting to run Details build container images / gh-runner (nvcr.io/nvidia/l4t-jetpack:r36.4.0, cublas, 12, 0, --jobs=4 --output-sync=target, linux/arm64, ubuntu-24.04-arm, true, auto, -nvidia-l4t-arm64, jammy, 2204) (push) Waiting to run Details build container images / gh-runner (ubuntu:24.04, cublas, 13, 0, --jobs=4 --output-sync=target, linux/arm64, ubuntu-24.04-arm, false, auto, -nvidia-l4t-arm64-cuda-13, noble, 2404) (push) Waiting to run Details Security Scan / tests (push) Waiting to run Details Tests extras backends / detect-changes (push) Waiting to run Details Tests extras backends / tests-transformers (push) Blocked by required conditions Details Tests extras backends / tests-rerankers (push) Blocked by required conditions Details Tests extras backends / tests-diffusers (push) Blocked by required conditions Details Tests extras backends / tests-coqui (push) Blocked by required conditions Details Tests extras backends / tests-moonshine (push) Blocked by required conditions Details Tests extras backends / tests-pocket-tts (push) Blocked by required conditions Details Tests extras backends / tests-qwen-tts (push) Blocked by required conditions Details Tests extras backends / tests-qwen-asr (push) Blocked by required conditions Details Tests extras backends / tests-nemo (push) Blocked by required conditions Details Tests extras backends / tests-voxcpm (push) Blocked by required conditions Details Tests extras backends / tests-llama-cpp-quantization (push) Blocked by required conditions Details Tests extras backends / tests-llama-cpp-grpc (push) Blocked by required conditions Details Tests extras backends / tests-llama-cpp-grpc-transcription (push) Blocked by required conditions Details Tests extras backends / tests-ik-llama-cpp-grpc (push) Blocked by required conditions Details Tests extras backends / tests-turboquant-grpc (push) Blocked by required conditions Details Tests extras backends / tests-acestep-cpp (push) Blocked by required conditions Details Tests extras backends / tests-qwen3-tts-cpp (push) Blocked by required conditions Details Tests extras backends / tests-voxtral (push) Blocked by required conditions Details Tests extras backends / tests-kokoros (push) Blocked by required conditions Details tests / tests-linux (1.26.x) (push) Waiting to run Details tests / tests-e2e-container (push) Waiting to run Details tests / tests-apple (1.26.x) (push) Waiting to run Details E2E Backend Tests / tests-e2e-backend (1.25.x) (push) Waiting to run Details UI E2E Tests / tests-ui-e2e (1.26.x) (push) Waiting to run Details GET /api/settings returns settings.ApiKeys as the merged env+runtime list via ApplicationConfig.ToRuntimeSettings(). The WebUI displays that list and round-trips it back on POST /api/settings unchanged. UpdateSettingsEndpoint was then doing: appConfig.ApiKeys = append(envKeys, runtimeKeys...) where runtimeKeys already contained envKeys (because the UI got them from the merged GET). Every save therefore duplicated the env keys on top of the previous merge, and also wrote the duplicates to runtime_settings.json so the duplication survived restarts and compounded with each save. This is the user-visible behaviour in #9071: the Web UI shows the keys twice / three times after consecutive saves. Before we marshal the settings to disk or call ApplyRuntimeSettings, drop any incoming key that already appears in startupConfig.ApiKeys. The file on disk now stores only the genuinely runtime-added keys; the subsequent append(envKeys, runtimeKeys...) produces one copy of each env key, as intended. Behaviour is unchanged for users who never had env keys set. Fixes #9071 Co-authored-by: SAY-5 <SAY-5@users.noreply.github.com>	2026-04-20 10:36:54 +02:00
Ettore Di Giacinto	75a63f87d8	feat(distributed): sync state with frontends, better backend management reporting (#9426 ) * fix(distributed): detect backend upgrades across worker nodes Before this change `DistributedBackendManager.CheckUpgrades` delegated to the local manager, which read backends from the frontend filesystem. In distributed deployments the frontend has no backends installed locally — they live on workers — so the upgrade-detection loop never ran and the UI silently never surfaced upgrades even when the gallery advertised newer versions or digests. Worker-side: NATS backend.list reply now carries Version, URI and Digest for each installed backend (read from metadata.json). Frontend-side: DistributedBackendManager.ListBackends aggregates per-node refs (name, status, version, digest) instead of deduping, and CheckUpgrades feeds that aggregation into gallery.CheckUpgradesAgainst — a new entrypoint factored out of CheckBackendUpgrades so both paths share the same core logic. Cluster drift policy: when per-node version/digest tuples disagree, the backend is flagged upgradeable regardless of whether any single node matches the gallery, and UpgradeInfo.NodeDrift enumerates the outliers so operators can see why it is out of sync. The next upgrade-all realigns the cluster. Tests cover: drift detection, unanimous-match (no upgrade), and the empty-installed-version path that the old distributed code silently missed. * feat(ui): surface backend upgrades in the System page The System page (Manage.jsx) only showed updates as a tiny inline arrow, so operators routinely missed them. Port the Backend Gallery's upgrade UX so System speaks the same visual language: - Yellow banner at the top of the Backends tab when upgrades are pending, with an "Upgrade all" button (serial fan-out, matches the gallery) and a "Updates only" filter toggle. - Warning pill (↑ N) next to the tab label so the count is glanceable even when the banner is scrolled out of view. - Per-row labeled "Upgrade to vX.Y" button (replaces the icon-only button that silently flipped semantics between Reinstall and Upgrade), plus an "Update available" badge in the new Version column. - New columns: Version (with upgrade + drift chips), Nodes (per-node attribution badges for distributed mode, degrading to a compact "on N nodes · M offline" chip above three nodes), Installed (relative time). - System backends render a "Protected" chip instead of a bare "—" so rows still align and the reason is obvious. - Delete uses the softer btn-danger-ghost so rows don't scream red; the ConfirmDialog still owns the "are you sure". The upgrade checker also needed the same per-worker fix as the previous commit: NewUpgradeChecker now takes a BackendManager getter so its periodic runs call the distributed CheckUpgrades (which asks workers) instead of the empty frontend filesystem. Without this the /api/backends/ upgrades endpoint stayed empty in distributed mode even with the protocol change in place. New CSS primitives — .upgrade-banner, .tab-pill, .badge-row, .cell-stack, .cell-mono, .cell-muted, .row-actions, .btn-danger-ghost — all live in App.css so other pages can adopt them without duplicating styles. * feat(ui): polish the Nodes page so it reads like a product The Nodes page was the biggest visual liability in distributed mode. Rework the main dashboard surfaces in place without changing behavior: StatCards: uniform height (96px min), left accent bar colored by the metric's semantic (success/warning/error/primary), icon lives in a 36x36 soft-tinted chip top-right, value is left-aligned and large. Grid auto-fills so the row doesn't collapse on narrow viewports. This replaces the previous thin-bordered boxes with inconsistent heights. Table rows: expandable rows now show a chevron cue on the left (rotates on expand) so users know rows open. Status cell became a dedicated chip with an LED-style halo dot instead of a bare bullet. Action buttons gained labels — "Approve", "Resume", "Drain" — so the icons aren't doing all the semantic work; the destructive remove action uses the softer btn-danger-ghost variant so rows don't scream red, with the ConfirmDialog still owning the real "are you sure". Applied cell-mono/cell-muted utility classes so label chips and addresses share one spacing/font grammar instead of re-declaring inline styles everywhere. Expanded drawer: empty states for Loaded Models and Installed Backends now render as a proper drawer-empty card (dashed border, icon, one-line hint) instead of a plain muted string that read like broken formatting. Tabs: three inline-styled buttons became the shared .tab class so they inherit focus ring, hover state, and the rest of the design system — matches the System page. "Add more workers" toggle turned into a .nodes-add-worker dashed-border button labelled "Register a new worker" (action voice) instead of a chevron + muted link that operators kept mistaking for broken text. New shared CSS primitives carry over to other pages: .stat-grid + .stat-card, .row-chevron, .node-status, .drawer-empty, .nodes-add-worker. * feat(distributed): durable backend fan-out + state reconciliation Two connected problems handled together: 1) Backend delete/install/upgrade used to silently skip non-healthy nodes, so a delete during an outage left a zombie on the offline node once it returned. The fan-out now records intent in a new pending_backend_ops table before attempting the NATS round-trip. Currently-healthy nodes get an immediate attempt; everyone else is queued. Unique index on (node_id, backend, op) means reissuing the same operation refreshes next_retry_at instead of stacking duplicates. 2) Loaded-model state could drift from reality: a worker OOM'd, got killed, or restarted a backend process would leave a node_models row claiming the model was still loaded, feeding ghost entries into the /api/nodes/models listing and the router's scheduling decisions. The existing ReplicaReconciler gains two new passes that run under a fresh KeyStateReconciler advisory lock (non-blocking, so one wedged frontend doesn't freeze the cluster): - drainPendingBackendOps: retries queued ops whose next_retry_at has passed on currently-healthy nodes. Success deletes the row; failure bumps attempts and pushes next_retry_at out with exponential backoff (30s → 15m cap). ErrNoResponders also marks the node unhealthy. - probeLoadedModels: gRPC-HealthChecks addresses the DB thinks are loaded but hasn't seen touched in the last probeStaleAfter (2m). Unreachable addresses are removed from the registry. A pluggable ModelProber lets tests substitute a fake without standing up gRPC. DistributedBackendManager exposes DeleteBackendDetailed so the HTTP handler can surface per-node outcomes ("2 succeeded, 1 queued") to the UI in a follow-up commit; the existing DeleteBackend still returns error-only for callers that don't care about node breakdown. Multi-frontend safety: the state pass uses advisorylock.TryWithLockCtx on a new key so N frontends coordinate — the same pattern the health monitor and replica reconciler already rely on. Single-node mode runs both passes inline (adapter is nil, state drain is a no-op). Tests cover the upsert semantics, backoff math, the probe removing an unreachable model but keeping a reachable one, and filtering by probeStaleAfter. * feat(ui): show cluster distribution of models in the System page When a frontend restarted in distributed mode, models that workers had already loaded weren't visible until the operator clicked into each node manually — the /api/models/capabilities endpoint only knew about configs on the frontend's filesystem, not the registry-backed truth. /api/models/capabilities now joins in ListAllLoadedModels() when the registry is active, returning loaded_on[] with node id/name/state/status for each model. Models that live in the registry but lack a local config (the actual ghosts, not recovered from the frontend's file cache) still surface with source="registry-only" so operators can see and persist them; without that emission they'd be invisible to this frontend. Manage → Models replaces the old Running/Idle pill with a distribution cell that lists the first three nodes the model is loaded on as chips colored by state (green loaded, blue loading, amber anything else). On wider clusters the remaining count collapses into a +N chip with a title-attribute breakdown. Disabled / single-node behavior unchanged. Adopted models get an extra "Adopted" ghost-icon chip with hover copy explaining what it means and how to make it permanent. Distributed mode also enables a 10s auto-refresh and a "Last synced Xs ago" indicator next to the Update button so ghost rows drop off within one reconcile tick after their owning process dies. Non-distributed mode is untouched — no polling, no cell-stack, same old Running/Idle. * feat(ui): NodeDistributionChip — shared per-node attribution component Large clusters were going to break the Manage → Backends Nodes column: the old inline logic rendered every node as a badge and would shred the layout at >10 workers, plus the Manage → Models distribution cell had copy-pasted its own slightly-different version. NodeDistributionChip handles any cluster size with two render modes: - small (≤3 nodes): inline chips of node names, colored by health. - large: a single "on N nodes · M offline · K drift" summary chip; clicking opens a Popover with a per-node table (name, status, version, digest for backends; name, status, state for models). Drift counting mirrors the backend's summarizeNodeDrift so the UI number matches UpgradeInfo.NodeDrift. Digests are truncated to the docker-style 12-char form with the full value preserved in the title. Popover is a new general-purpose primitive: fixed positioning anchored to the trigger, flips above when there's no room below, closes on outside-click or Escape, returns focus to the trigger. Uses .card as its surface so theming is inherited. Also useful for a future labels-editor popup and the user menu. Manage.jsx drops its duplicated inline Nodes-column + loaded_on cell and uses the shared chip with context="backends" / "models" respectively. Delete code removes ~40 lines of ad-hoc logic. * feat(ui): shared FilterBar across the System page tabs The Backends gallery had a nice search + chip + toggle strip; the System page had nothing, so the two surfaces felt like different apps. Lift the pattern into a reusable FilterBar and wire both System tabs through it. New component core/http/react-ui/src/components/FilterBar.jsx renders a search input, a role="tablist" chip row (aria-selected for a11y), and optional toggles / right slot. Chips support an optional `count` which the System page uses to show "User 3", "Updates 1" etc. System Models tab: search by id or backend; chips for All/Running/Idle/Disabled/Pinned plus a conditional Distributed chip in distributed mode. "Last synced" + Update button live in the right slot. System Backends tab: search by name/alias/meta-backend-for; chips for All/User/System/Meta plus conditional Updates / Offline-nodes chips when relevant. The old ad-hoc "Updates only" toggle from the upgrade banner folded into the Updates chip — one source of truth for that filter. Offline chip only appears in distributed mode when at least one backend has an unhealthy node, so the chip row stays quiet on healthy clusters. Filter state persists in URL query params (mq/mf/bq/bf) so deep links and tab switches keep the operator's filter context instead of resetting every time. Also adds an "Adopted" distribution path: when a model in /api/models/capabilities carries source="registry-only" (discovered on a worker but not configured locally), the Models tab shows a ghost chip labelled "Adopted" with hover copy explaining how to persist it — this is what closes the loop on the ghost-model story end-to-end.	2026-04-19 17:55:53 +02:00
Ettore Di Giacinto	054c4b4b45	feat(stable-diffusion.ggml): add support for video generation (#9420 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-04-19 09:26:33 +02:00
Ettore Di Giacinto	e463820566	fix(ui): fix dark-theme colors in chat Some checks are pending UI E2E Tests / tests-ui-e2e (1.26.x) (push) Waiting to run Details build container images / core-image-build (ubuntu:24.04, cublas, 12, 8, --jobs=4 --output-sync=target, linux/amd64, ubuntu-latest, false, auto, -gpu-nvidia-cuda-12, noble, 2404) (push) Waiting to run Details build container images / core-image-build (ubuntu:24.04, vulkan, --jobs=4 --output-sync=target, linux/amd64,linux/arm64, ubuntu-latest, false, auto, -gpu-vulkan, noble, 2404) (push) Waiting to run Details build container images / gh-runner (nvcr.io/nvidia/l4t-jetpack:r36.4.0, cublas, 12, 0, --jobs=4 --output-sync=target, linux/arm64, ubuntu-24.04-arm, true, auto, -nvidia-l4t-arm64, jammy, 2204) (push) Waiting to run Details build container images / gh-runner (ubuntu:24.04, cublas, 13, 0, --jobs=4 --output-sync=target, linux/arm64, ubuntu-24.04-arm, false, auto, -nvidia-l4t-arm64-cuda-13, noble, 2404) (push) Waiting to run Details Security Scan / tests (push) Waiting to run Details Tests extras backends / tests-llama-cpp-grpc-transcription (push) Blocked by required conditions Details Tests extras backends / tests-ik-llama-cpp-grpc (push) Blocked by required conditions Details Tests extras backends / tests-turboquant-grpc (push) Blocked by required conditions Details Tests extras backends / detect-changes (push) Waiting to run Details Tests extras backends / tests-transformers (push) Blocked by required conditions Details Tests extras backends / tests-rerankers (push) Blocked by required conditions Details Tests extras backends / tests-diffusers (push) Blocked by required conditions Details Tests extras backends / tests-coqui (push) Blocked by required conditions Details Tests extras backends / tests-moonshine (push) Blocked by required conditions Details Tests extras backends / tests-pocket-tts (push) Blocked by required conditions Details Tests extras backends / tests-qwen-tts (push) Blocked by required conditions Details Tests extras backends / tests-qwen-asr (push) Blocked by required conditions Details Tests extras backends / tests-nemo (push) Blocked by required conditions Details Tests extras backends / tests-voxcpm (push) Blocked by required conditions Details Tests extras backends / tests-llama-cpp-quantization (push) Blocked by required conditions Details Tests extras backends / tests-llama-cpp-grpc (push) Blocked by required conditions Details Tests extras backends / tests-acestep-cpp (push) Blocked by required conditions Details Tests extras backends / tests-qwen3-tts-cpp (push) Blocked by required conditions Details Tests extras backends / tests-voxtral (push) Blocked by required conditions Details Tests extras backends / tests-kokoros (push) Blocked by required conditions Details tests / tests-linux (1.26.x) (push) Waiting to run Details tests / tests-e2e-container (push) Waiting to run Details tests / tests-apple (1.26.x) (push) Waiting to run Details E2E Backend Tests / tests-e2e-backend (1.25.x) (push) Waiting to run Details Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-04-18 23:01:01 +00:00
Ettore Di Giacinto	7809c5f5d0	fix(vision): propagate mtmd media marker from backend via ModelMetadata (#9412 ) Upstream llama.cpp (PR #21962) switched the server-side mtmd media marker to a random per-server string and removed the legacy "<__media__>" backward-compat replacement in mtmd_tokenizer. The Go layer still emitted the hardcoded "<__media__>", so on the non-tokenizer-template path the prompt arrived with a marker mtmd did not recognize and tokenization failed with "number of bitmaps (1) does not match number of markers (0)". Report the active media marker via ModelMetadataResponse.media_marker and substitute the sentinel "<__media__>" with it right before the gRPC call, after the backend has been loaded and probed. Also skip the Go-side multimodal templating entirely when UseTokenizerTemplate is true — llama.cpp's oaicompat_chat_params_parse already injects its own marker and StringContent is unused in that path. Backends that do not expose the field keep the legacy "<__media__>" behavior.	2026-04-18 20:30:13 +02:00
dependabot[bot]	12b069f9bd	chore(deps): bump dompurify from 3.3.2 to 3.4.0 in /core/http/react-ui in the npm_and_yarn group across 1 directory (#9376 ) chore(deps): bump dompurify Bumps the npm_and_yarn group with 1 update in the /core/http/react-ui directory: [dompurify](https://github.com/cure53/DOMPurify). Updates `dompurify` from 3.3.2 to 3.4.0 - [Release notes](https://github.com/cure53/DOMPurify/releases) - [Commits](https://github.com/cure53/DOMPurify/compare/3.3.2...3.4.0) --- updated-dependencies: - dependency-name: dompurify dependency-version: 3.4.0 dependency-type: direct:production dependency-group: npm_and_yarn ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-04-17 09:06:32 +02:00
Ettore Di Giacinto	7c5d6162f7	fix(ui): rename model config files on save to prevent duplicates (#9388 ) Editing a model's YAML and changing the `name:` field previously wrote the new body to the original `<oldName>.yaml`. On reload the config loader indexed that file under the new name while the old key lingered in memory, producing two entries in the system UI that shared a single underlying file — deleting either removed both. Detect the rename in EditModelEndpoint and rename the on-disk `<name>.yaml` and `._gallery_<name>.yaml` to match, drop the stale in-memory key before the reload, and redirect the editor URL in the React UI so it tracks the new name. Reject conflicts (409) and names containing path separators (400). Fixes #9294	2026-04-17 08:12:48 +02:00
Ettore Di Giacinto	b4e30692a2	feat(backends): add sglang (#9359 ) * feat(backends): add sglang Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(sglang): force AVX-512 CXXFLAGS and disable CI e2e job sgl-kernel's shm.cpp uses __m512 AVX-512 intrinsics unconditionally; -march=native fails on CI runners without AVX-512 in /proc/cpuinfo. Force -march=sapphirerapids so the build always succeeds, matching sglang upstream's docker/xeon.Dockerfile recipe. The resulting binary still requires an AVX-512 capable CPU at runtime, so disable tests-sglang-grpc in test-extra.yml for the same reason tests-vllm-grpc is disabled. Local runs with make test-extra-backend-sglang still work on hosts with the right SIMD baseline. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(sglang): patch CMakeLists.txt instead of CXXFLAGS for AVX-512 CXXFLAGS with -march=sapphirerapids was being overridden by add_compile_options(-march=native) in sglang's CPU CMakeLists.txt, since CMake appends those flags after CXXFLAGS. Sed-patch the CMakeLists.txt directly after cloning to replace -march=native. --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-04-16 22:40:56 +02:00
Ettore Di Giacinto	61d34ccb11	fix(ui): show also concrete backends in the backend list Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-04-16 17:44:25 +00:00
dependabot[bot]	ab326a9c61	chore(deps): bump the npm_and_yarn group across 1 directory with 6 updates (#9373 ) Bumps the npm_and_yarn group with 6 updates in the /core/http/react-ui directory: \| Package \| From \| To \| \| --- \| --- \| --- \| \| [vite](https://github.com/vitejs/vite/tree/HEAD/packages/vite) \| `6.4.1` \| `6.4.2` \| \| [@hono/node-server](https://github.com/honojs/node-server) \| `1.19.11` \| `1.19.14` \| \| [flatted](https://github.com/WebReflection/flatted) \| `3.3.4` \| `3.4.2` \| \| [hono](https://github.com/honojs/hono) \| `4.12.7` \| `4.12.14` \| \| [path-to-regexp](https://github.com/pillarjs/path-to-regexp) \| `8.3.0` \| `8.4.2` \| \| [picomatch](https://github.com/micromatch/picomatch) \| `4.0.3` \| `4.0.4` \| Updates `vite` from 6.4.1 to 6.4.2 - [Release notes](https://github.com/vitejs/vite/releases) - [Changelog](https://github.com/vitejs/vite/blob/v6.4.2/packages/vite/CHANGELOG.md) - [Commits](https://github.com/vitejs/vite/commits/v6.4.2/packages/vite) Updates `@hono/node-server` from 1.19.11 to 1.19.14 - [Release notes](https://github.com/honojs/node-server/releases) - [Commits](https://github.com/honojs/node-server/compare/v1.19.11...v1.19.14) Updates `flatted` from 3.3.4 to 3.4.2 - [Commits](https://github.com/WebReflection/flatted/compare/v3.3.4...v3.4.2) Updates `hono` from 4.12.7 to 4.12.14 - [Release notes](https://github.com/honojs/hono/releases) - [Commits](https://github.com/honojs/hono/compare/v4.12.7...v4.12.14) Updates `path-to-regexp` from 8.3.0 to 8.4.2 - [Release notes](https://github.com/pillarjs/path-to-regexp/releases) - [Changelog](https://github.com/pillarjs/path-to-regexp/blob/master/History.md) - [Commits](https://github.com/pillarjs/path-to-regexp/compare/v8.3.0...v8.4.2) Updates `picomatch` from 4.0.3 to 4.0.4 - [Release notes](https://github.com/micromatch/picomatch/releases) - [Changelog](https://github.com/micromatch/picomatch/blob/master/CHANGELOG.md) - [Commits](https://github.com/micromatch/picomatch/compare/4.0.3...4.0.4) --- updated-dependencies: - dependency-name: vite dependency-version: 6.4.2 dependency-type: direct:development dependency-group: npm_and_yarn - dependency-name: "@hono/node-server" dependency-version: 1.19.14 dependency-type: indirect dependency-group: npm_and_yarn - dependency-name: flatted dependency-version: 3.4.2 dependency-type: indirect dependency-group: npm_and_yarn - dependency-name: hono dependency-version: 4.12.14 dependency-type: indirect dependency-group: npm_and_yarn - dependency-name: path-to-regexp dependency-version: 8.4.2 dependency-type: indirect dependency-group: npm_and_yarn - dependency-name: picomatch dependency-version: 4.0.4 dependency-type: indirect dependency-group: npm_and_yarn ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-04-16 08:23:03 +02:00
Ettore Di Giacinto	ad3c8c4832	fix(agents): handle embedding model dim changes on collection upload (#9365 ) Bumps LocalAGI to pick up the LocalRecall postgres backend fix that resizes the pgvector column when the configured embedding model returns vectors of a different dimensionality than the existing collection. Switching the agent pool's embedding model now triggers a transparent re-embed at startup instead of failing every subsequent upload with 'expected N dimensions, not M' (SQLSTATE 22000). Also surfaces a 409 with an actionable message in UploadToCollectionEndpoint as a safety net for the rare cases the upstream migration path doesn't cover (e.g. a model swapped at runtime), instead of the previous opaque 500.	2026-04-15 20:05:28 +02:00
Ettore Di Giacinto	410d100cc3	chore(ui): improve visibility of forms, color palette Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-04-14 21:53:03 +00:00
Ettore Di Giacinto	87e6de1989	feat: wire transcription for llama.cpp, add streaming support (#9353 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-04-14 16:13:40 +02:00
Ettore Di Giacinto	2865f0f8d3	feat(ux): backend management enhancement (#9325 ) * feat: add PreferDevelopmentBackends setting, expose isMeta/isDevelopment in API - Add PreferDevelopmentBackends config field, CLI flag, runtime setting - Add IsDevelopment() method to GalleryBackend - Use AvailableBackendsUnfiltered in UI API to show all backends - Expose isMeta, isDevelopment, preferDevelopmentBackends in backend API response * feat: upgrade banner with Upgrade All button, detect pre-existing backends - Add upgrade banner on Backends page showing count and Upgrade All button - Fix upgrade detection for backends installed before version tracking: flag as upgradeable when gallery has a version but installed has none - Fix OCI digest check to flag backends with no stored digest as upgradeable	2026-04-12 00:35:22 +02:00
Ettore Di Giacinto	8ab0744458	feat: backend versioning, upgrade detection and auto-upgrade (#9315 ) * feat: add backend versioning data model foundation Add Version, URI, and Digest fields to BackendMetadata for tracking installed backend versions and enabling upgrade detection. Add Version field to GalleryBackend. Add UpgradeAvailable/AvailableVersion fields to SystemBackend. Implement GetImageDigest() for lightweight OCI digest lookups via remote.Head. Record version, URI, and digest at install time in InstallBackend() and propagate version through meta backends. * feat: add backend upgrade detection and execution logic Add CheckBackendUpgrades() to compare installed backend versions/digests against gallery entries, and UpgradeBackend() to perform atomic upgrades with backup-based rollback on failure. Includes Agent A's data model changes (Version/URI/Digest fields, GetImageDigest). * feat: add AutoUpgradeBackends config and runtime settings Add configuration and runtime settings for backend auto-upgrade: - RuntimeSettings field for dynamic config via API/JSON - ApplicationConfig field, option func, and roundtrip conversion - CLI flag with LOCALAI_AUTO_UPGRADE_BACKENDS env var - Config file watcher support for runtime_settings.json - Tests for ToRuntimeSettings, ApplyRuntimeSettings, and roundtrip * feat(ui): add backend version display and upgrade support - Add upgrade check/trigger API endpoints to config and api module - Backends page: version badge, upgrade indicator, upgrade button - Manage page: version in metadata, context-aware upgrade/reinstall button - Settings page: auto-upgrade backends toggle * feat: add upgrade checker service, API endpoints, and CLI command - UpgradeChecker background service: checks every 6h, auto-upgrades when enabled - API endpoints: GET /backends/upgrades, POST /backends/upgrades/check, POST /backends/upgrade/:name - CLI: `localai backends upgrade` command, version display in `backends list` - BackendManager interface: add UpgradeBackend and CheckUpgrades methods - Wire upgrade op through GalleryService backend handler - Distributed mode: fan-out upgrade to worker nodes via NATS * fix: use advisory lock for upgrade checker in distributed mode In distributed mode with multiple frontend instances, use PostgreSQL advisory lock (KeyBackendUpgradeCheck) so only one instance runs periodic upgrade checks and auto-upgrades. Prevents duplicate upgrade operations across replicas. Standalone mode is unchanged (simple ticker loop). * test: add e2e tests for backend upgrade API - Test GET /api/backends/upgrades returns 200 (even with no upgrade checker) - Test POST /api/backends/upgrade/:name accepts request and returns job ID - Test full upgrade flow: trigger upgrade via API, wait for job completion, verify run.sh updated to v2 and metadata.json has version 2.0.0 - Test POST /api/backends/upgrades/check returns 200 - Fix nil check for applicationInstance in upgrade API routes	2026-04-11 22:31:15 +02:00
Ettore Di Giacinto	5c35e85fe2	feat: allow to pin models and skip from reaping (#9309 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-04-11 08:38:17 +02:00
Leigh Phillips	062e0d0d00	feat: Add toggle mechanism to enable/disable models from loading on demand (#9304 ) * feat: add toggle mechanism to enable/disable models from loading on demand Implements #9303 - Adds ability to disable models from being auto-loaded while keeping them in the collection. Backend changes: - Add Disabled field to ModelConfig struct with IsDisabled() getter - New ToggleModelEndpoint handler (PUT /models/toggle/:name/:action) - Request middleware returns 403 when disabled model is requested - Capabilities endpoint exposes disabled status Frontend changes: - Toggle switch in System > Models table Actions column - Visual indicators: dimmed row, red Disabled badge, muted icons - Tooltip describes toggle function on hover - Loading state while API call is in progress * fix: remove extra closing brace causing syntax error in request middleware * refactor: reorder Actions column - Stop button before toggle switch * refactor: migrate from toggle to toggle-state per PR review feedback	2026-04-10 18:17:41 +02:00
Ettore Di Giacinto	9748a1cbc6	fix(streaming): skip chat deltas for role-init elements to prevent first token duplication (#9299 ) When TASK_RESPONSE_TYPE_OAI_CHAT is used, the first streaming token produces a JSON array with two elements: a role-init chunk and the actual content chunk. The grpc-server loop called attach_chat_deltas for both elements with the same raw_result pointer, stamping the first token's ChatDelta.Content on both replies. The Go side accumulated both, emitting the first content token twice to SSE clients. Fix: in the array iteration loops in PredictStream, detect role-init elements (delta has "role" key) and skip attach_chat_deltas for them. Only content/reasoning elements get chat deltas attached. Reasoning models are unaffected because their first token goes into reasoning_content, not content.	2026-04-10 08:45:47 +02:00
Ettore Di Giacinto	e1a6010874	fix(streaming): deduplicate tool call emissions during streaming (#9292 ) The Go-side incremental JSON parser was emitting the same tool call on every streaming token because it lacked the len > lastEmittedCount guard that the XML parser had. On top of that, the post-streaming default: case re-emitted all tool calls from index 0, duplicating everything. This produced duplicate delta.tool_calls events causing clients to accumulate arguments as "{args}{args}" — invalid JSON. Fixes: - JSON incremental parser: add len(jsonResults) > lastEmittedCount guard and loop from lastEmittedCount (matching the XML parser pattern) - Post-streaming default: case: skip i < lastEmittedCount entries that were already emitted during streaming - JSON parser: use blocking channel send (matching XML parser behavior)	2026-04-10 00:44:25 +02:00
Ettore Di Giacinto	706cf5d43c	feat(sam.cpp): add sam.cpp detection backend (#9288 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-04-09 21:49:11 +02:00
Ettore Di Giacinto	13a6ed709c	fix: thinking models with tools returning empty content (reasoning-only retry loop) (#9290 ) When clients like Nextcloud or Home Assistant send requests with tools to thinking models (e.g. Gemma 4 with <\|channel>thought tags), the response was empty despite the backend producing valid content. Root cause: the C++ autoparser puts clean content in both the raw Response and ChatDeltas. The Go-side PrependThinkingTokenIfNeeded then prepends the thinking start token to the already-clean content, causing ExtractReasoning to classify the entire response as unclosed reasoning. This made cbRawResult empty, triggering a retry loop that never succeeds. Two fixes: - inference.go: check ChatDeltas for content/tool_calls regardless of whether Response is empty, so skipCallerRetry fires correctly - chat.go: when ChatDeltas have content but no tool calls, use that content directly instead of falling back to the empty cbRawResult	2026-04-09 18:30:31 +02:00
Ettore Di Giacinto	85be4ff03c	feat(api): add ollama compatibility (#9284 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-04-09 14:15:14 +02:00
Ettore Di Giacinto	39c6b3ed66	feat: track files being staged (#9275 ) This changeset makes visible when files are being staged, so users are aware that the model "isn't ready yet" for requests. Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-04-08 14:33:58 +02:00
Ettore Di Giacinto	0e9d1a6588	chore(ci): drop unnecessary test Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-04-08 12:19:54 +00:00
Richard Palethorpe	9ac1bdc587	feat(ui): Interactive model config editor with autocomplete (#9149 ) * feat(ui): Add dynamic model editor with autocomplete Signed-off-by: Richard Palethorpe <io@richiejp.com> * chore(docs): Add link to longformat installation video Signed-off-by: Richard Palethorpe <io@richiejp.com> --------- Signed-off-by: Richard Palethorpe <io@richiejp.com>	2026-04-07 14:42:23 +02:00
Ettore Di Giacinto	505c417fa7	fix(gpu): better detection for MacOS and Thor (#9263 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-04-07 00:39:07 +02:00
Ettore Di Giacinto	0f9d516a6c	fix(anthropic): do not emit empty tokens and fix SSE tool calls (#9258 ) This fixes Claude Code compatibility Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-04-07 00:38:21 +02:00
Ettore Di Giacinto	92f99b1ec3	fix(token): login via legacy api keys (#9249 ) We were not checking against the api keys when db == nil. This commit also cleanups now unused middleware Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-04-06 21:45:09 +02:00
Ettore Di Giacinto	773489eeb1	fix(chat): do not retry if we had chatdeltas or tooldeltas from backend (#9244 ) * fix(chat): do not retry if we had chatdeltas or tooldeltas from backend Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix: use oai compat for llama.cpp Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix: apply to non-streaming path too Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * map also other fields Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-04-06 10:52:23 +02:00
Ettore Di Giacinto	232e324a68	fix(autoparser): correctly pass by logprobs (#9239 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-04-05 09:39:22 +02:00
Ettore Di Giacinto	53deeb1107	fix(reasoning): suppress partial tag tokens during autoparser warm-up The C++ PEG parser needs a few tokens to identify the reasoning format (e.g. "<\|channel>thought\n" for Gemma 4). During this warm-up, the gRPC layer was sending raw partial tag tokens to Go, which leaked into the reasoning field. - Clear reply.message in gRPC when autoparser is active but has no diffs yet, matching llama.cpp server behavior of only emitting classified output - Prefer C++ autoparser chat deltas for reasoning/content in all streaming paths, falling back to Go-side extraction for backends without autoparser (e.g. vLLM) - Override non-streaming no-tools result with chat delta content when available - Guard PrependThinkingTokenIfNeeded against partial tag prefixes during streaming accumulation - Reorder default thinking tokens so <\|channel>thought is checked before <\|think\|> (Gemma 4 templates contain both)	2026-04-04 20:45:57 +00:00
Ettore Di Giacinto	c5a840f6af	fix(reasoning): warm-up Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-04-04 20:25:24 +00:00
Ettore Di Giacinto	6d9d77d590	fix(reasoning): accumulate and strip reasoning tags from autoparser results (#9227 ) fix(reasoning): acccumulate and strip reasoning tags from autoparser results Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-04-04 18:15:32 +02:00
Richard Palethorpe	557d0f0f04	feat(api): Allow coding agents to interactively discover how to control and configure LocalAI (#9084 ) Signed-off-by: Richard Palethorpe <io@richiejp.com>	2026-04-04 15:14:35 +02:00
Ettore Di Giacinto	b7e3589875	fix(anthropic): show null index when not present, default to 0 (#9225 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-04-04 15:13:17 +02:00
Ettore Di Giacinto	716ddd697b	feat(autoparser): prefer chat deltas from backends when emitted (#9224 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-04-04 12:12:08 +02:00
Ettore Di Giacinto	9f8821bba8	feat(gemma4): add thinking support (#9221 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-04-04 12:11:38 +02:00
Ettore Di Giacinto	84e51b68ef	fix(ui): pass by staticApiKeyRequired to show login when only api key is configured (#9220 ) This fixes #9213 Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-04-04 12:11:22 +02:00
Ettore Di Giacinto	6c635e8353	feat: add resume endpoint to undrain nodes (#9197 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-04-01 18:21:43 +02:00
Ettore Di Giacinto	e587ecc485	chore(ui): allow to unload forcefully Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-03-31 17:20:53 +00:00
Ettore Di Giacinto	221ff0f28f	feat(ui): show cluster status in home in distributed mode Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-03-31 15:37:58 +00:00
Ettore Di Giacinto	16d5cb00bd	chore: css cleanups Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-03-31 16:37:38 +02:00
Richard Palethorpe	efdcbbe332	feat(api): Return 404 when model is not found except for model names in HF format (#9133 ) Signed-off-by: Richard Palethorpe <io@richiejp.com>	2026-03-31 10:48:21 +02:00
Ettore Di Giacinto	b4fff9293d	chore: small ui improvements in the node page Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-03-31 08:41:40 +00:00
Ettore Di Giacinto	3db12eaa7a	fix(oauth/invite): do not register user (prending approval) without correct invite (#9189 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-03-31 08:29:07 +02:00
Ettore Di Giacinto	8862e3ce60	feat: add node reconciler, allow to schedule to group of nodes, min/max autoscaler (#9186 ) * always enable parallel requests Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat: add node reconciler, allow to schedule to group of nodes, min/max autoscaler Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * chore: move tests to ginkgo Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * chore(smart router): order by available vram Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-03-31 08:28:56 +02:00
Richard Palethorpe	c2f7d1c18b	feat(ui): Add media history to studio pages (e.g. past images) (#9151 ) Signed-off-by: Richard Palethorpe <io@richiejp.com>	2026-03-30 00:49:55 +02:00
Ettore Di Giacinto	59108fbe32	feat: add distributed mode (#9124 ) * feat: add distributed mode (experimental) Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix data races, mutexes, transactions Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactorings Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fixups Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix events and tool stream in agent chat Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * use ginkgo Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactoring and consolidation Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactoring and consolidation Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactoring and consolidation Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactoring and consolidation Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactoring and consolidation Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactoring and consolidation Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactoring and consolidation Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactoring and consolidation Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(cron): compute correctly time boundaries avoiding re-triggering Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * enhancements, refactorings Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * do not flood of healthy checks Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * do not list obvious backends as text backends Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * tests fixups Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactoring and consolidation Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Drop redundant healthcheck Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * enhancements, refactorings Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-03-30 00:47:27 +02:00

1 2 3 4 5 ...

482 commits