LocalAI

mirror of https://github.com/mudler/LocalAI synced 2026-04-21 13:27:21 +00:00

Author	SHA1	Message	Date
Ettore Di Giacinto	8ab56e2ad3	feat(gallery): add wan i2v 720p (#9457 ) feat(gallery): add Wan 2.1 I2V 14B 720P + pin all wan ggufs by sha256 Adds a new entry for the native-720p image-to-video sibling of the 480p I2V model (wan-2.1-i2v-14b-480p-ggml). The 720p I2V model is trained purely as image-to-video — no first-last-frame interpolation path — so motion is freer than repurposing the FLF2V 720P variant as an i2v. Shares the same VAE, umt5_xxl text encoder, and clip_vision_h auxiliary files as the existing 480p I2V and 720p FLF2V entries, so no new aux downloads are introduced. Also pins the main diffusion gguf by sha256 for the new entry and for the three existing wan entries that were previously missing a hash (wan-2.1-t2v-1.3b-ggml, wan-2.1-i2v-14b-480p-ggml, wan-2.1-flf2v-14b-720p-ggml). Hashes were fetched from HuggingFace's x-linked-etag header per .agents/adding-gallery-models.md. Assisted-by: Claude:claude-opus-4-7	2026-04-20 23:34:11 +02:00
pjbrzozowski	ecf85fde9e	fix(api): remove duplicate /api/traces endpoint that broke React UI (#9427 ) Some checks are pending build container images / core-image-build (ubuntu:24.04, cublas, 12, 8, --jobs=4 --output-sync=target, linux/amd64, ubuntu-latest, false, auto, -gpu-nvidia-cuda-12, noble, 2404) (push) Waiting to run Details build container images / core-image-build (ubuntu:24.04, vulkan, --jobs=4 --output-sync=target, linux/amd64,linux/arm64, ubuntu-latest, false, auto, -gpu-vulkan, noble, 2404) (push) Waiting to run Details build container images / gh-runner (nvcr.io/nvidia/l4t-jetpack:r36.4.0, cublas, 12, 0, --jobs=4 --output-sync=target, linux/arm64, ubuntu-24.04-arm, true, auto, -nvidia-l4t-arm64, jammy, 2204) (push) Waiting to run Details build container images / gh-runner (ubuntu:24.04, cublas, 13, 0, --jobs=4 --output-sync=target, linux/arm64, ubuntu-24.04-arm, false, auto, -nvidia-l4t-arm64-cuda-13, noble, 2404) (push) Waiting to run Details Tests extras backends / tests-llama-cpp-quantization (push) Blocked by required conditions Details Tests extras backends / tests-llama-cpp-grpc (push) Blocked by required conditions Details Tests extras backends / tests-llama-cpp-grpc-transcription (push) Blocked by required conditions Details Tests extras backends / tests-ik-llama-cpp-grpc (push) Blocked by required conditions Details Tests extras backends / tests-turboquant-grpc (push) Blocked by required conditions Details Tests extras backends / tests-acestep-cpp (push) Blocked by required conditions Details Tests extras backends / tests-qwen3-tts-cpp (push) Blocked by required conditions Details Tests extras backends / tests-voxtral (push) Blocked by required conditions Details Tests extras backends / tests-kokoros (push) Blocked by required conditions Details tests / tests-linux (1.26.x) (push) Waiting to run Details tests / tests-e2e-container (push) Waiting to run Details tests / tests-apple (1.26.x) (push) Waiting to run Details E2E Backend Tests / tests-e2e-backend (1.25.x) (push) Waiting to run Details UI E2E Tests / tests-ui-e2e (1.26.x) (push) Waiting to run Details Security Scan / tests (push) Waiting to run Details Tests extras backends / detect-changes (push) Waiting to run Details Tests extras backends / tests-transformers (push) Blocked by required conditions Details Tests extras backends / tests-rerankers (push) Blocked by required conditions Details Tests extras backends / tests-diffusers (push) Blocked by required conditions Details Tests extras backends / tests-coqui (push) Blocked by required conditions Details Tests extras backends / tests-moonshine (push) Blocked by required conditions Details Tests extras backends / tests-pocket-tts (push) Blocked by required conditions Details Tests extras backends / tests-qwen-tts (push) Blocked by required conditions Details Tests extras backends / tests-qwen-asr (push) Blocked by required conditions Details Tests extras backends / tests-nemo (push) Blocked by required conditions Details Tests extras backends / tests-voxcpm (push) Blocked by required conditions Details The API Traces tab in /app/traces always showed (0) traces despite requests being recorded. The /api/traces endpoint was registered in both localai.go and ui_api.go. The ui_api.go version wrapped the response as {"traces": [...]} instead of the flat []APIExchange array that both the React UI (Traces.jsx) and the legacy Alpine.js UI (traces.html) expect. Because Echo matched the ui_api.go handler, Array.isArray(apiData) always returned false, making the API Traces tab permanently empty. Remove the duplicate endpoints from ui_api.go so only the correct flat-array version in localai.go is served. Also use mime.ParseMediaType for the Content-Type check in the trace middleware so requests with parameters (e.g. application/json; charset=utf-8) are still traced. Signed-off-by: Pawel Brzozowski <paul@ontux.net> Co-authored-by: Pawel Brzozowski <paul@ontux.net>	2026-04-20 18:44:49 +02:00
Sai Asish Y	6480715a16	fix(settings): strip env-supplied ApiKeys from the request before persisting (#9438 ) Some checks are pending build container images / core-image-build (ubuntu:24.04, cublas, 12, 8, --jobs=4 --output-sync=target, linux/amd64, ubuntu-latest, false, auto, -gpu-nvidia-cuda-12, noble, 2404) (push) Waiting to run Details build container images / core-image-build (ubuntu:24.04, vulkan, --jobs=4 --output-sync=target, linux/amd64,linux/arm64, ubuntu-latest, false, auto, -gpu-vulkan, noble, 2404) (push) Waiting to run Details build container images / gh-runner (nvcr.io/nvidia/l4t-jetpack:r36.4.0, cublas, 12, 0, --jobs=4 --output-sync=target, linux/arm64, ubuntu-24.04-arm, true, auto, -nvidia-l4t-arm64, jammy, 2204) (push) Waiting to run Details build container images / gh-runner (ubuntu:24.04, cublas, 13, 0, --jobs=4 --output-sync=target, linux/arm64, ubuntu-24.04-arm, false, auto, -nvidia-l4t-arm64-cuda-13, noble, 2404) (push) Waiting to run Details Security Scan / tests (push) Waiting to run Details Tests extras backends / detect-changes (push) Waiting to run Details Tests extras backends / tests-transformers (push) Blocked by required conditions Details Tests extras backends / tests-rerankers (push) Blocked by required conditions Details Tests extras backends / tests-diffusers (push) Blocked by required conditions Details Tests extras backends / tests-coqui (push) Blocked by required conditions Details Tests extras backends / tests-moonshine (push) Blocked by required conditions Details Tests extras backends / tests-pocket-tts (push) Blocked by required conditions Details Tests extras backends / tests-qwen-tts (push) Blocked by required conditions Details Tests extras backends / tests-qwen-asr (push) Blocked by required conditions Details Tests extras backends / tests-nemo (push) Blocked by required conditions Details Tests extras backends / tests-voxcpm (push) Blocked by required conditions Details Tests extras backends / tests-llama-cpp-quantization (push) Blocked by required conditions Details Tests extras backends / tests-llama-cpp-grpc (push) Blocked by required conditions Details Tests extras backends / tests-llama-cpp-grpc-transcription (push) Blocked by required conditions Details Tests extras backends / tests-ik-llama-cpp-grpc (push) Blocked by required conditions Details Tests extras backends / tests-turboquant-grpc (push) Blocked by required conditions Details Tests extras backends / tests-acestep-cpp (push) Blocked by required conditions Details Tests extras backends / tests-qwen3-tts-cpp (push) Blocked by required conditions Details Tests extras backends / tests-voxtral (push) Blocked by required conditions Details Tests extras backends / tests-kokoros (push) Blocked by required conditions Details tests / tests-linux (1.26.x) (push) Waiting to run Details tests / tests-e2e-container (push) Waiting to run Details tests / tests-apple (1.26.x) (push) Waiting to run Details E2E Backend Tests / tests-e2e-backend (1.25.x) (push) Waiting to run Details UI E2E Tests / tests-ui-e2e (1.26.x) (push) Waiting to run Details GET /api/settings returns settings.ApiKeys as the merged env+runtime list via ApplicationConfig.ToRuntimeSettings(). The WebUI displays that list and round-trips it back on POST /api/settings unchanged. UpdateSettingsEndpoint was then doing: appConfig.ApiKeys = append(envKeys, runtimeKeys...) where runtimeKeys already contained envKeys (because the UI got them from the merged GET). Every save therefore duplicated the env keys on top of the previous merge, and also wrote the duplicates to runtime_settings.json so the duplication survived restarts and compounded with each save. This is the user-visible behaviour in #9071: the Web UI shows the keys twice / three times after consecutive saves. Before we marshal the settings to disk or call ApplyRuntimeSettings, drop any incoming key that already appears in startupConfig.ApiKeys. The file on disk now stores only the genuinely runtime-added keys; the subsequent append(envKeys, runtimeKeys...) produces one copy of each env key, as intended. Behaviour is unchanged for users who never had env keys set. Fixes #9071 Co-authored-by: SAY-5 <SAY-5@users.noreply.github.com>	2026-04-20 10:36:54 +02:00
Ettore Di Giacinto	f683231811	feat(gallery): add Wan 2.1 FLF2V 14B 720P (#9440 ) First-last-frame-to-video variant of the 14B Wan family. Accepts a start and end reference image and — unlike the pure i2v path — runs both through clip_vision, so the final frame lands on the end image both in pixel and semantic space. Right pick for seamless loops (start_image == end_image) and narrative A→B cuts. Shares the same VAE, umt5_xxl text encoder, and clip_vision_h as the I2V 14B entry. Options block mirrors i2v's full-list-in-override style so the template merge doesn't drop fields. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 10:34:36 +02:00
LocalAI [bot]	960757f0e8	chore(model gallery): 🤖 add 1 new models via gallery agent (#9436 ) chore(model gallery): 🤖 add new models via gallery agent Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-04-20 08:48:47 +02:00
Ettore Di Giacinto	865fd552f5	docs(agents): adopt kernel's AI coding assistants policy Some checks are pending Tests extras backends / tests-rerankers (push) Blocked by required conditions Details Tests extras backends / tests-turboquant-grpc (push) Blocked by required conditions Details Tests extras backends / detect-changes (push) Waiting to run Details Tests extras backends / tests-transformers (push) Blocked by required conditions Details build container images / core-image-build (ubuntu:24.04, cublas, 12, 8, --jobs=4 --output-sync=target, linux/amd64, ubuntu-latest, false, auto, -gpu-nvidia-cuda-12, noble, 2404) (push) Waiting to run Details build container images / core-image-build (ubuntu:24.04, vulkan, --jobs=4 --output-sync=target, linux/amd64,linux/arm64, ubuntu-latest, false, auto, -gpu-vulkan, noble, 2404) (push) Waiting to run Details build container images / gh-runner (nvcr.io/nvidia/l4t-jetpack:r36.4.0, cublas, 12, 0, --jobs=4 --output-sync=target, linux/arm64, ubuntu-24.04-arm, true, auto, -nvidia-l4t-arm64, jammy, 2204) (push) Waiting to run Details build container images / gh-runner (ubuntu:24.04, cublas, 13, 0, --jobs=4 --output-sync=target, linux/arm64, ubuntu-24.04-arm, false, auto, -nvidia-l4t-arm64-cuda-13, noble, 2404) (push) Waiting to run Details Security Scan / tests (push) Waiting to run Details Tests extras backends / tests-diffusers (push) Blocked by required conditions Details Tests extras backends / tests-coqui (push) Blocked by required conditions Details Tests extras backends / tests-moonshine (push) Blocked by required conditions Details Tests extras backends / tests-pocket-tts (push) Blocked by required conditions Details Tests extras backends / tests-qwen-tts (push) Blocked by required conditions Details Tests extras backends / tests-qwen-asr (push) Blocked by required conditions Details Tests extras backends / tests-nemo (push) Blocked by required conditions Details Tests extras backends / tests-voxcpm (push) Blocked by required conditions Details Tests extras backends / tests-llama-cpp-quantization (push) Blocked by required conditions Details Tests extras backends / tests-llama-cpp-grpc (push) Blocked by required conditions Details Tests extras backends / tests-llama-cpp-grpc-transcription (push) Blocked by required conditions Details Tests extras backends / tests-ik-llama-cpp-grpc (push) Blocked by required conditions Details Tests extras backends / tests-acestep-cpp (push) Blocked by required conditions Details Tests extras backends / tests-qwen3-tts-cpp (push) Blocked by required conditions Details Tests extras backends / tests-voxtral (push) Blocked by required conditions Details Tests extras backends / tests-kokoros (push) Blocked by required conditions Details tests / tests-linux (1.26.x) (push) Waiting to run Details tests / tests-e2e-container (push) Waiting to run Details tests / tests-apple (1.26.x) (push) Waiting to run Details E2E Backend Tests / tests-e2e-backend (1.25.x) (push) Waiting to run Details UI E2E Tests / tests-ui-e2e (1.26.x) (push) Waiting to run Details Align LocalAI with the Linux kernel project's policy for AI-assisted contributions (https://docs.kernel.org/process/coding-assistants.html). - Add .agents/ai-coding-assistants.md with the full policy adapted to LocalAI's MIT license: no Signed-off-by or Co-Authored-By from AI, attribute AI involvement via an Assisted-by: trailer, human submitter owns the contribution. - Surface the rules at the entry points: AGENTS.md (and its CLAUDE.md symlink) and CONTRIBUTING.md. - Publish a user-facing reference page at docs/content/reference/ai-coding-assistants.md and link it from the references index. Assisted-by: Claude:claude-opus-4-7	2026-04-19 22:50:54 +00:00
LocalAI [bot]	cb77a5a4b9	chore(model gallery): 🤖 add 1 new models via gallery agent (#9425 ) chore(model gallery): 🤖 add new models via gallery agent Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-04-20 00:42:44 +02:00
Ettore Di Giacinto	60633c4dd5	fix(stable-diffusion.ggml): force mp4 container in ffmpeg mux (#9435 ) gen_video's ffmpeg subprocess was relying on the filename extension to choose the output container. Distributed LocalAI hands the backend a staging path (e.g. /staging/localai-output-NNN.tmp) that is renamed to .mp4 only after the backend returns, so ffmpeg saw a .tmp extension and bailed with "Unable to choose an output format". Inference had already completed and the frames were piped in, producing the cryptic "video inference failed (code 1)" at the API layer. Pass -f mp4 explicitly so the container is selected by flag instead of by filename suffix. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 00:41:54 +02:00
Ettore Di Giacinto	9e44944cc1	fix(i2v): Add new options to the model configuration Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>	2026-04-20 00:27:05 +02:00
Ettore Di Giacinto	372eb08dcf	fix(gallery): allow uninstalling orphaned meta backends + force reinstall (#9434 ) Two interrelated bugs that combined to make a meta backend impossible to uninstall once its concrete had been removed from disk (partial install, earlier crash, manual cleanup). 1. DeleteBackendFromSystem returned "meta backend %q not found" and bailed out early when the concrete directory didn't exist, preventing the orphaned meta dir from ever being removed. Treat a missing concrete as idempotent success — log a warning and continue to remove the orphan meta. 2. InstallBackendFromGallery's "already installed, skip" short-circuit only checked that the name was known (`backends.Exists(name)`); an orphaned meta whose RunFile points at a missing concrete still satisfies that check, so every reinstall returned nil without doing anything. Afterwards the worker's findBackend returned empty and we kept looping with "backend %q not found after install attempt". Require the entry to be actually runnable (run.sh stat-able, not a directory) before skipping. New helper isBackendRunnable centralises the runnability test so both the install guard and future callers stay in sync. Tests cover the orphaned-meta delete path and the non-runnable short-circuit case.	2026-04-20 00:10:19 +02:00
LocalAI [bot]	28091d626e	chore: ⬆️ Update ikawrakow/ik_llama.cpp to `00ba208a5c036eee72d4a631b4f57c126095cb03` (#9430 ) ⬆️ Update ikawrakow/ik_llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-04-20 00:01:48 +02:00
LocalAI [bot]	cae79d9107	feat(swagger): update swagger (#9431 ) Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-04-19 23:39:50 +02:00
LocalAI [bot]	babbbc6ec8	chore: ⬆️ Update ggml-org/llama.cpp to `4eac5b45095a4e8a1ff1cce4f6d030e0872fb4ad` (#9429 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-04-19 23:39:19 +02:00
LocalAI [bot]	3804497186	chore: ⬆️ Update leejet/stable-diffusion.cpp to `44cca3d626d301e2215d5e243277e8f0e65bfa78` (#9428 ) ⬆️ Update leejet/stable-diffusion.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-04-19 23:39:07 +02:00
Ettore Di Giacinto	fda1c553a1	fix(distributed): stop queue loops on agent nodes + dead-letter cap (#9433 ) pending_backend_ops rows targeting agent-type workers looped forever: the reconciler fan-out hit a NATS subject the worker doesn't subscribe to, returned ErrNoResponders, we marked the node unhealthy, and the health monitor flipped it back to healthy on the next heartbeat. Next tick, same row, same failure. Three related fixes: 1. enqueueAndDrainBackendOp skips nodes whose NodeType != backend. Agent workers handle agent NATS subjects, not backend.install / delete / list, so enqueueing for them guarantees an infinite retry loop. Silent skip is correct — they aren't consumers of these ops. 2. Reconciler drain mirrors enqueueAndDrainBackendOp's behavior on nats.ErrNoResponders: mark the node unhealthy before recording the failure, so subsequent ListDuePendingBackendOps (filters by status=healthy) stops picking the row until the node actually recovers. Matches the synchronous fan-out path. 3. Dead-letter cap at maxPendingBackendOpAttempts (10). After ~1h of exponential backoff the row is a poison message; further retries just thrash NATS. Row is deleted and logged at ERROR so it stays visible without staying infinite. Plus a one-shot startup cleanup in NewNodeRegistry: drop queue rows that target agent-type nodes, non-existent nodes, or carry an empty backend name. Guarded by the same schema-migration advisory lock so only one instance performs it. The guards above prevent new rows of this shape; this closes the migration gap for existing ones. Tests: the prune migration (valid row stays, agent + empty-name rows drop) on top of existing upsert / backoff coverage.	2026-04-19 23:38:43 +02:00
Ettore Di Giacinto	b27de08fff	chore(gallery): fixup wan Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-04-19 21:31:22 +00:00
Ettore Di Giacinto	510f791ccc	feat(gallery): add stablediffusion-ggml-development meta backend Some checks are pending Security Scan / tests (push) Waiting to run Details Tests extras backends / detect-changes (push) Waiting to run Details build container images / core-image-build (ubuntu:24.04, cublas, 12, 8, --jobs=4 --output-sync=target, linux/amd64, ubuntu-latest, false, auto, -gpu-nvidia-cuda-12, noble, 2404) (push) Waiting to run Details build container images / core-image-build (ubuntu:24.04, vulkan, --jobs=4 --output-sync=target, linux/amd64,linux/arm64, ubuntu-latest, false, auto, -gpu-vulkan, noble, 2404) (push) Waiting to run Details build container images / gh-runner (nvcr.io/nvidia/l4t-jetpack:r36.4.0, cublas, 12, 0, --jobs=4 --output-sync=target, linux/arm64, ubuntu-24.04-arm, true, auto, -nvidia-l4t-arm64, jammy, 2204) (push) Waiting to run Details build container images / gh-runner (ubuntu:24.04, cublas, 13, 0, --jobs=4 --output-sync=target, linux/arm64, ubuntu-24.04-arm, false, auto, -nvidia-l4t-arm64-cuda-13, noble, 2404) (push) Waiting to run Details Tests extras backends / tests-transformers (push) Blocked by required conditions Details Tests extras backends / tests-rerankers (push) Blocked by required conditions Details Tests extras backends / tests-turboquant-grpc (push) Blocked by required conditions Details Tests extras backends / tests-acestep-cpp (push) Blocked by required conditions Details Tests extras backends / tests-diffusers (push) Blocked by required conditions Details Tests extras backends / tests-coqui (push) Blocked by required conditions Details Tests extras backends / tests-moonshine (push) Blocked by required conditions Details Tests extras backends / tests-pocket-tts (push) Blocked by required conditions Details Tests extras backends / tests-qwen-tts (push) Blocked by required conditions Details Tests extras backends / tests-qwen-asr (push) Blocked by required conditions Details Tests extras backends / tests-nemo (push) Blocked by required conditions Details Tests extras backends / tests-voxcpm (push) Blocked by required conditions Details Tests extras backends / tests-llama-cpp-quantization (push) Blocked by required conditions Details Tests extras backends / tests-llama-cpp-grpc (push) Blocked by required conditions Details Tests extras backends / tests-llama-cpp-grpc-transcription (push) Blocked by required conditions Details Tests extras backends / tests-ik-llama-cpp-grpc (push) Blocked by required conditions Details Tests extras backends / tests-qwen3-tts-cpp (push) Blocked by required conditions Details Tests extras backends / tests-voxtral (push) Blocked by required conditions Details Tests extras backends / tests-kokoros (push) Blocked by required conditions Details tests / tests-linux (1.26.x) (push) Waiting to run Details tests / tests-e2e-container (push) Waiting to run Details tests / tests-apple (1.26.x) (push) Waiting to run Details E2E Backend Tests / tests-e2e-backend (1.25.x) (push) Waiting to run Details UI E2E Tests / tests-ui-e2e (1.26.x) (push) Waiting to run Details	2026-04-19 20:16:33 +00:00
Ettore Di Giacinto	369c50a41c	fix(turboquant): drop ignore-eos patch, bump fork to b8967-627ebbc (#9423 ) * fix(turboquant): drop ignore-eos patch, bump fork to b8967-627ebbc The upstream PR #21203 (server: respect the ignore_eos flag) has been merged into the TheTom/llama-cpp-turboquant feature/turboquant-kv-cache branch. With the fix now in-tree, 0001-server-respect-the-ignore-eos-flag.patch no longer applies (git apply sees its additions already present) and the nightly turboquant bump fails. Retire the patch and bump the pin to the first fork revision that carries the merged fix (tag feature-turboquant-kv-cache-b8967-627ebbc). This matches the contract in apply-patches.sh: drop patches once the fork catches up. * fix(turboquant): patch out get_media_marker() call in grpc-server copy CI turboquant docker build was failing with: grpc-server.cpp:2825:40: error: use of undeclared identifier 'get_media_marker' The call was added by `7809c5f5` (PR #9412) to propagate the mtmd random per-server media marker upstream landed in ggml-org/llama.cpp#21962. The TheTom/llama-cpp-turboquant fork branched before that PR, so its server-common.cpp has no such symbol. Extend patch-grpc-server.sh to substitute get_media_marker() with the legacy "<__media__>" literal in the build-time grpc-server.cpp copy under turboquant-<flavor>-build/. The fork's mtmd_default_marker() returns exactly that string, and the Go layer falls back to the same sentinel when media_marker is empty, so behavior on the turboquant path is unchanged. Patched copy only — the shared source under backend/cpp/llama-cpp/ keeps compiling against vanilla upstream. Verified by running `make docker-build-turboquant` locally end-to-end: all five flavors (avx, avx2, avx512, fallback, grpc+rpc-server) now compile past the previous failure and the image tags successfully.	2026-04-19 21:05:21 +02:00
Ettore Di Giacinto	75a63f87d8	feat(distributed): sync state with frontends, better backend management reporting (#9426 ) * fix(distributed): detect backend upgrades across worker nodes Before this change `DistributedBackendManager.CheckUpgrades` delegated to the local manager, which read backends from the frontend filesystem. In distributed deployments the frontend has no backends installed locally — they live on workers — so the upgrade-detection loop never ran and the UI silently never surfaced upgrades even when the gallery advertised newer versions or digests. Worker-side: NATS backend.list reply now carries Version, URI and Digest for each installed backend (read from metadata.json). Frontend-side: DistributedBackendManager.ListBackends aggregates per-node refs (name, status, version, digest) instead of deduping, and CheckUpgrades feeds that aggregation into gallery.CheckUpgradesAgainst — a new entrypoint factored out of CheckBackendUpgrades so both paths share the same core logic. Cluster drift policy: when per-node version/digest tuples disagree, the backend is flagged upgradeable regardless of whether any single node matches the gallery, and UpgradeInfo.NodeDrift enumerates the outliers so operators can see why it is out of sync. The next upgrade-all realigns the cluster. Tests cover: drift detection, unanimous-match (no upgrade), and the empty-installed-version path that the old distributed code silently missed. * feat(ui): surface backend upgrades in the System page The System page (Manage.jsx) only showed updates as a tiny inline arrow, so operators routinely missed them. Port the Backend Gallery's upgrade UX so System speaks the same visual language: - Yellow banner at the top of the Backends tab when upgrades are pending, with an "Upgrade all" button (serial fan-out, matches the gallery) and a "Updates only" filter toggle. - Warning pill (↑ N) next to the tab label so the count is glanceable even when the banner is scrolled out of view. - Per-row labeled "Upgrade to vX.Y" button (replaces the icon-only button that silently flipped semantics between Reinstall and Upgrade), plus an "Update available" badge in the new Version column. - New columns: Version (with upgrade + drift chips), Nodes (per-node attribution badges for distributed mode, degrading to a compact "on N nodes · M offline" chip above three nodes), Installed (relative time). - System backends render a "Protected" chip instead of a bare "—" so rows still align and the reason is obvious. - Delete uses the softer btn-danger-ghost so rows don't scream red; the ConfirmDialog still owns the "are you sure". The upgrade checker also needed the same per-worker fix as the previous commit: NewUpgradeChecker now takes a BackendManager getter so its periodic runs call the distributed CheckUpgrades (which asks workers) instead of the empty frontend filesystem. Without this the /api/backends/ upgrades endpoint stayed empty in distributed mode even with the protocol change in place. New CSS primitives — .upgrade-banner, .tab-pill, .badge-row, .cell-stack, .cell-mono, .cell-muted, .row-actions, .btn-danger-ghost — all live in App.css so other pages can adopt them without duplicating styles. * feat(ui): polish the Nodes page so it reads like a product The Nodes page was the biggest visual liability in distributed mode. Rework the main dashboard surfaces in place without changing behavior: StatCards: uniform height (96px min), left accent bar colored by the metric's semantic (success/warning/error/primary), icon lives in a 36x36 soft-tinted chip top-right, value is left-aligned and large. Grid auto-fills so the row doesn't collapse on narrow viewports. This replaces the previous thin-bordered boxes with inconsistent heights. Table rows: expandable rows now show a chevron cue on the left (rotates on expand) so users know rows open. Status cell became a dedicated chip with an LED-style halo dot instead of a bare bullet. Action buttons gained labels — "Approve", "Resume", "Drain" — so the icons aren't doing all the semantic work; the destructive remove action uses the softer btn-danger-ghost variant so rows don't scream red, with the ConfirmDialog still owning the real "are you sure". Applied cell-mono/cell-muted utility classes so label chips and addresses share one spacing/font grammar instead of re-declaring inline styles everywhere. Expanded drawer: empty states for Loaded Models and Installed Backends now render as a proper drawer-empty card (dashed border, icon, one-line hint) instead of a plain muted string that read like broken formatting. Tabs: three inline-styled buttons became the shared .tab class so they inherit focus ring, hover state, and the rest of the design system — matches the System page. "Add more workers" toggle turned into a .nodes-add-worker dashed-border button labelled "Register a new worker" (action voice) instead of a chevron + muted link that operators kept mistaking for broken text. New shared CSS primitives carry over to other pages: .stat-grid + .stat-card, .row-chevron, .node-status, .drawer-empty, .nodes-add-worker. * feat(distributed): durable backend fan-out + state reconciliation Two connected problems handled together: 1) Backend delete/install/upgrade used to silently skip non-healthy nodes, so a delete during an outage left a zombie on the offline node once it returned. The fan-out now records intent in a new pending_backend_ops table before attempting the NATS round-trip. Currently-healthy nodes get an immediate attempt; everyone else is queued. Unique index on (node_id, backend, op) means reissuing the same operation refreshes next_retry_at instead of stacking duplicates. 2) Loaded-model state could drift from reality: a worker OOM'd, got killed, or restarted a backend process would leave a node_models row claiming the model was still loaded, feeding ghost entries into the /api/nodes/models listing and the router's scheduling decisions. The existing ReplicaReconciler gains two new passes that run under a fresh KeyStateReconciler advisory lock (non-blocking, so one wedged frontend doesn't freeze the cluster): - drainPendingBackendOps: retries queued ops whose next_retry_at has passed on currently-healthy nodes. Success deletes the row; failure bumps attempts and pushes next_retry_at out with exponential backoff (30s → 15m cap). ErrNoResponders also marks the node unhealthy. - probeLoadedModels: gRPC-HealthChecks addresses the DB thinks are loaded but hasn't seen touched in the last probeStaleAfter (2m). Unreachable addresses are removed from the registry. A pluggable ModelProber lets tests substitute a fake without standing up gRPC. DistributedBackendManager exposes DeleteBackendDetailed so the HTTP handler can surface per-node outcomes ("2 succeeded, 1 queued") to the UI in a follow-up commit; the existing DeleteBackend still returns error-only for callers that don't care about node breakdown. Multi-frontend safety: the state pass uses advisorylock.TryWithLockCtx on a new key so N frontends coordinate — the same pattern the health monitor and replica reconciler already rely on. Single-node mode runs both passes inline (adapter is nil, state drain is a no-op). Tests cover the upsert semantics, backoff math, the probe removing an unreachable model but keeping a reachable one, and filtering by probeStaleAfter. * feat(ui): show cluster distribution of models in the System page When a frontend restarted in distributed mode, models that workers had already loaded weren't visible until the operator clicked into each node manually — the /api/models/capabilities endpoint only knew about configs on the frontend's filesystem, not the registry-backed truth. /api/models/capabilities now joins in ListAllLoadedModels() when the registry is active, returning loaded_on[] with node id/name/state/status for each model. Models that live in the registry but lack a local config (the actual ghosts, not recovered from the frontend's file cache) still surface with source="registry-only" so operators can see and persist them; without that emission they'd be invisible to this frontend. Manage → Models replaces the old Running/Idle pill with a distribution cell that lists the first three nodes the model is loaded on as chips colored by state (green loaded, blue loading, amber anything else). On wider clusters the remaining count collapses into a +N chip with a title-attribute breakdown. Disabled / single-node behavior unchanged. Adopted models get an extra "Adopted" ghost-icon chip with hover copy explaining what it means and how to make it permanent. Distributed mode also enables a 10s auto-refresh and a "Last synced Xs ago" indicator next to the Update button so ghost rows drop off within one reconcile tick after their owning process dies. Non-distributed mode is untouched — no polling, no cell-stack, same old Running/Idle. * feat(ui): NodeDistributionChip — shared per-node attribution component Large clusters were going to break the Manage → Backends Nodes column: the old inline logic rendered every node as a badge and would shred the layout at >10 workers, plus the Manage → Models distribution cell had copy-pasted its own slightly-different version. NodeDistributionChip handles any cluster size with two render modes: - small (≤3 nodes): inline chips of node names, colored by health. - large: a single "on N nodes · M offline · K drift" summary chip; clicking opens a Popover with a per-node table (name, status, version, digest for backends; name, status, state for models). Drift counting mirrors the backend's summarizeNodeDrift so the UI number matches UpgradeInfo.NodeDrift. Digests are truncated to the docker-style 12-char form with the full value preserved in the title. Popover is a new general-purpose primitive: fixed positioning anchored to the trigger, flips above when there's no room below, closes on outside-click or Escape, returns focus to the trigger. Uses .card as its surface so theming is inherited. Also useful for a future labels-editor popup and the user menu. Manage.jsx drops its duplicated inline Nodes-column + loaded_on cell and uses the shared chip with context="backends" / "models" respectively. Delete code removes ~40 lines of ad-hoc logic. * feat(ui): shared FilterBar across the System page tabs The Backends gallery had a nice search + chip + toggle strip; the System page had nothing, so the two surfaces felt like different apps. Lift the pattern into a reusable FilterBar and wire both System tabs through it. New component core/http/react-ui/src/components/FilterBar.jsx renders a search input, a role="tablist" chip row (aria-selected for a11y), and optional toggles / right slot. Chips support an optional `count` which the System page uses to show "User 3", "Updates 1" etc. System Models tab: search by id or backend; chips for All/Running/Idle/Disabled/Pinned plus a conditional Distributed chip in distributed mode. "Last synced" + Update button live in the right slot. System Backends tab: search by name/alias/meta-backend-for; chips for All/User/System/Meta plus conditional Updates / Offline-nodes chips when relevant. The old ad-hoc "Updates only" toggle from the upgrade banner folded into the Updates chip — one source of truth for that filter. Offline chip only appears in distributed mode when at least one backend has an unhealthy node, so the chip row stays quiet on healthy clusters. Filter state persists in URL query params (mq/mf/bq/bf) so deep links and tab switches keep the operator's filter context instead of resetting every time. Also adds an "Adopted" distribution path: when a model in /api/models/capabilities carries source="registry-only" (discovered on a worker but not configured locally), the Models tab shows a ghost chip labelled "Adopted" with hover copy explaining how to persist it — this is what closes the loop on the ghost-model story end-to-end.	2026-04-19 17:55:53 +02:00
Ettore Di Giacinto	9cd8d7951f	fix(kokoros): implement audio_transcription_stream trait stub (#9422 ) Some checks are pending build container images / gh-runner (nvcr.io/nvidia/l4t-jetpack:r36.4.0, cublas, 12, 0, --jobs=4 --output-sync=target, linux/arm64, ubuntu-24.04-arm, true, auto, -nvidia-l4t-arm64, jammy, 2204) (push) Waiting to run Details build container images / gh-runner (ubuntu:24.04, cublas, 13, 0, --jobs=4 --output-sync=target, linux/arm64, ubuntu-24.04-arm, false, auto, -nvidia-l4t-arm64-cuda-13, noble, 2404) (push) Waiting to run Details Security Scan / tests (push) Waiting to run Details build container images / core-image-build (ubuntu:24.04, cublas, 12, 8, --jobs=4 --output-sync=target, linux/amd64, ubuntu-latest, false, auto, -gpu-nvidia-cuda-12, noble, 2404) (push) Waiting to run Details build container images / core-image-build (ubuntu:24.04, vulkan, --jobs=4 --output-sync=target, linux/amd64,linux/arm64, ubuntu-latest, false, auto, -gpu-vulkan, noble, 2404) (push) Waiting to run Details Tests extras backends / tests-coqui (push) Blocked by required conditions Details Tests extras backends / detect-changes (push) Waiting to run Details Tests extras backends / tests-transformers (push) Blocked by required conditions Details Tests extras backends / tests-rerankers (push) Blocked by required conditions Details Tests extras backends / tests-diffusers (push) Blocked by required conditions Details Tests extras backends / tests-moonshine (push) Blocked by required conditions Details Tests extras backends / tests-pocket-tts (push) Blocked by required conditions Details Tests extras backends / tests-qwen-tts (push) Blocked by required conditions Details Tests extras backends / tests-qwen-asr (push) Blocked by required conditions Details Tests extras backends / tests-nemo (push) Blocked by required conditions Details Tests extras backends / tests-voxcpm (push) Blocked by required conditions Details Tests extras backends / tests-llama-cpp-quantization (push) Blocked by required conditions Details Tests extras backends / tests-llama-cpp-grpc (push) Blocked by required conditions Details Tests extras backends / tests-llama-cpp-grpc-transcription (push) Blocked by required conditions Details Tests extras backends / tests-ik-llama-cpp-grpc (push) Blocked by required conditions Details Tests extras backends / tests-turboquant-grpc (push) Blocked by required conditions Details Tests extras backends / tests-acestep-cpp (push) Blocked by required conditions Details Tests extras backends / tests-qwen3-tts-cpp (push) Blocked by required conditions Details Tests extras backends / tests-voxtral (push) Blocked by required conditions Details Tests extras backends / tests-kokoros (push) Blocked by required conditions Details tests / tests-linux (1.26.x) (push) Waiting to run Details tests / tests-e2e-container (push) Waiting to run Details tests / tests-apple (1.26.x) (push) Waiting to run Details E2E Backend Tests / tests-e2e-backend (1.25.x) (push) Waiting to run Details UI E2E Tests / tests-ui-e2e (1.26.x) (push) Waiting to run Details The backend.proto was updated to add AudioTranscriptionStream RPC, but the Rust KokorosService was never updated to match the regenerated tonic trait, breaking compilation with E0046. Stubs the new streaming method as unimplemented, matching the pattern used for the other streaming RPCs Kokoros does not support.	2026-04-19 13:29:58 +02:00
LocalAI [bot]	884bfb84c9	chore: ⬆️ Update ikawrakow/ik_llama.cpp to `8befd92ea5f702494ea9813fe42a52fb015db5fe` (#9418 ) ⬆️ Update ikawrakow/ik_llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-04-19 09:27:11 +02:00
LocalAI [bot]	e94a9a8f10	chore: ⬆️ Update leejet/stable-diffusion.cpp to `7d33d4b2ddeafa672761a5880ec33bdff452504d` (#9417 ) ⬆️ Update leejet/stable-diffusion.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com> Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>	2026-04-19 09:26:58 +02:00
Ettore Di Giacinto	054c4b4b45	feat(stable-diffusion.ggml): add support for video generation (#9420 ) Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-04-19 09:26:33 +02:00
LocalAI [bot]	6e49dba27c	chore: ⬆️ Update ggml-org/llama.cpp to `4f02d4733934179386cbc15b3454be26237940bb` (#9415 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-04-19 09:26:05 +02:00
Ettore Di Giacinto	e463820566	fix(ui): fix dark-theme colors in chat Some checks are pending UI E2E Tests / tests-ui-e2e (1.26.x) (push) Waiting to run Details build container images / core-image-build (ubuntu:24.04, cublas, 12, 8, --jobs=4 --output-sync=target, linux/amd64, ubuntu-latest, false, auto, -gpu-nvidia-cuda-12, noble, 2404) (push) Waiting to run Details build container images / core-image-build (ubuntu:24.04, vulkan, --jobs=4 --output-sync=target, linux/amd64,linux/arm64, ubuntu-latest, false, auto, -gpu-vulkan, noble, 2404) (push) Waiting to run Details build container images / gh-runner (nvcr.io/nvidia/l4t-jetpack:r36.4.0, cublas, 12, 0, --jobs=4 --output-sync=target, linux/arm64, ubuntu-24.04-arm, true, auto, -nvidia-l4t-arm64, jammy, 2204) (push) Waiting to run Details build container images / gh-runner (ubuntu:24.04, cublas, 13, 0, --jobs=4 --output-sync=target, linux/arm64, ubuntu-24.04-arm, false, auto, -nvidia-l4t-arm64-cuda-13, noble, 2404) (push) Waiting to run Details Security Scan / tests (push) Waiting to run Details Tests extras backends / tests-llama-cpp-grpc-transcription (push) Blocked by required conditions Details Tests extras backends / tests-ik-llama-cpp-grpc (push) Blocked by required conditions Details Tests extras backends / tests-turboquant-grpc (push) Blocked by required conditions Details Tests extras backends / detect-changes (push) Waiting to run Details Tests extras backends / tests-transformers (push) Blocked by required conditions Details Tests extras backends / tests-rerankers (push) Blocked by required conditions Details Tests extras backends / tests-diffusers (push) Blocked by required conditions Details Tests extras backends / tests-coqui (push) Blocked by required conditions Details Tests extras backends / tests-moonshine (push) Blocked by required conditions Details Tests extras backends / tests-pocket-tts (push) Blocked by required conditions Details Tests extras backends / tests-qwen-tts (push) Blocked by required conditions Details Tests extras backends / tests-qwen-asr (push) Blocked by required conditions Details Tests extras backends / tests-nemo (push) Blocked by required conditions Details Tests extras backends / tests-voxcpm (push) Blocked by required conditions Details Tests extras backends / tests-llama-cpp-quantization (push) Blocked by required conditions Details Tests extras backends / tests-llama-cpp-grpc (push) Blocked by required conditions Details Tests extras backends / tests-acestep-cpp (push) Blocked by required conditions Details Tests extras backends / tests-qwen3-tts-cpp (push) Blocked by required conditions Details Tests extras backends / tests-voxtral (push) Blocked by required conditions Details Tests extras backends / tests-kokoros (push) Blocked by required conditions Details tests / tests-linux (1.26.x) (push) Waiting to run Details tests / tests-e2e-container (push) Waiting to run Details tests / tests-apple (1.26.x) (push) Waiting to run Details E2E Backend Tests / tests-e2e-backend (1.25.x) (push) Waiting to run Details Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-04-18 23:01:01 +00:00
Keith Mattix II	8839a71c87	fix(rocm): add gfx1151 support and expose AMDGPU_TARGETS build-arg (#9410 ) Some checks are pending Tests extras backends / tests-voxtral (push) Blocked by required conditions Details Tests extras backends / tests-kokoros (push) Blocked by required conditions Details build container images / core-image-build (ubuntu:24.04, cublas, 12, 8, --jobs=4 --output-sync=target, linux/amd64, ubuntu-latest, false, auto, -gpu-nvidia-cuda-12, noble, 2404) (push) Waiting to run Details build container images / core-image-build (ubuntu:24.04, vulkan, --jobs=4 --output-sync=target, linux/amd64,linux/arm64, ubuntu-latest, false, auto, -gpu-vulkan, noble, 2404) (push) Waiting to run Details build container images / gh-runner (nvcr.io/nvidia/l4t-jetpack:r36.4.0, cublas, 12, 0, --jobs=4 --output-sync=target, linux/arm64, ubuntu-24.04-arm, true, auto, -nvidia-l4t-arm64, jammy, 2204) (push) Waiting to run Details build container images / gh-runner (ubuntu:24.04, cublas, 13, 0, --jobs=4 --output-sync=target, linux/arm64, ubuntu-24.04-arm, false, auto, -nvidia-l4t-arm64-cuda-13, noble, 2404) (push) Waiting to run Details Security Scan / tests (push) Waiting to run Details Tests extras backends / tests-diffusers (push) Blocked by required conditions Details Tests extras backends / detect-changes (push) Waiting to run Details Tests extras backends / tests-transformers (push) Blocked by required conditions Details Tests extras backends / tests-rerankers (push) Blocked by required conditions Details Tests extras backends / tests-coqui (push) Blocked by required conditions Details Tests extras backends / tests-moonshine (push) Blocked by required conditions Details Tests extras backends / tests-pocket-tts (push) Blocked by required conditions Details Tests extras backends / tests-qwen-tts (push) Blocked by required conditions Details Tests extras backends / tests-qwen-asr (push) Blocked by required conditions Details Tests extras backends / tests-nemo (push) Blocked by required conditions Details Tests extras backends / tests-voxcpm (push) Blocked by required conditions Details Tests extras backends / tests-llama-cpp-quantization (push) Blocked by required conditions Details Tests extras backends / tests-llama-cpp-grpc (push) Blocked by required conditions Details Tests extras backends / tests-llama-cpp-grpc-transcription (push) Blocked by required conditions Details Tests extras backends / tests-ik-llama-cpp-grpc (push) Blocked by required conditions Details Tests extras backends / tests-turboquant-grpc (push) Blocked by required conditions Details Tests extras backends / tests-acestep-cpp (push) Blocked by required conditions Details Tests extras backends / tests-qwen3-tts-cpp (push) Blocked by required conditions Details tests / tests-linux (1.26.x) (push) Waiting to run Details tests / tests-e2e-container (push) Waiting to run Details tests / tests-apple (1.26.x) (push) Waiting to run Details E2E Backend Tests / tests-e2e-backend (1.25.x) (push) Waiting to run Details UI E2E Tests / tests-ui-e2e (1.26.x) (push) Waiting to run Details Add gfx1151 (AMD Strix Halo / Ryzen AI MAX) to the default AMDGPU_TARGETS list in the llama-cpp backend Makefile. ROCm 7.2.1 ships with gfx1151 Tensile libraries, so this architecture should be included in default builds. Also expose AMDGPU_TARGETS as an ARG/ENV in Dockerfile.llama-cpp so that users building for non-default GPU architectures can override the target list via --build-arg AMDGPU_TARGETS=<arch>. Previously, passing -DAMDGPU_TARGETS=<arch> through CMAKE_ARGS was silently overridden by the Makefile's own append of the default target list. Fixes #9374 Signed-off-by: Keith Mattix <keithmattix2@gmail.com> Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>	2026-04-18 20:39:40 +02:00
Ettore Di Giacinto	117f6430b8	fix(turboquant): resolve common.h by detecting llama-common vs common target (#9413 ) The shared grpc-server CMakeLists hardcoded `llama-common`, the post-rename target name in upstream llama.cpp. The turboquant fork branched before that rename and still exposes the helpers library as `common`, so the name silently degraded to a plain `-llama-common` link flag, the PUBLIC include directory was never propagated, and tools/server/server-task.h failed to find common.h during turboquant-<flavor> builds.	2026-04-18 20:30:28 +02:00
Ettore Di Giacinto	7809c5f5d0	fix(vision): propagate mtmd media marker from backend via ModelMetadata (#9412 ) Upstream llama.cpp (PR #21962) switched the server-side mtmd media marker to a random per-server string and removed the legacy "<__media__>" backward-compat replacement in mtmd_tokenizer. The Go layer still emitted the hardcoded "<__media__>", so on the non-tokenizer-template path the prompt arrived with a marker mtmd did not recognize and tokenization failed with "number of bitmaps (1) does not match number of markers (0)". Report the active media marker via ModelMetadataResponse.media_marker and substitute the sentinel "<__media__>" with it right before the gRPC call, after the backend has been loaded and probed. Also skip the Go-side multimodal templating entirely when UseTokenizerTemplate is true — llama.cpp's oaicompat_chat_params_parse already injects its own marker and StringContent is unused in that path. Backends that do not expose the field keep the legacy "<__media__>" behavior.	2026-04-18 20:30:13 +02:00
LocalAI [bot]	ad742738cb	chore: ⬆️ Update ikawrakow/ik_llama.cpp to `52efa12fdae390d1dca6ecd7ca00010fe51f651e` (#9404 ) Some checks are pending tests / tests-e2e-container (push) Waiting to run Details tests / tests-apple (1.26.x) (push) Waiting to run Details build container images / core-image-build (ubuntu:24.04, cublas, 12, 8, --jobs=4 --output-sync=target, linux/amd64, ubuntu-latest, false, auto, -gpu-nvidia-cuda-12, noble, 2404) (push) Waiting to run Details build container images / core-image-build (ubuntu:24.04, vulkan, --jobs=4 --output-sync=target, linux/amd64,linux/arm64, ubuntu-latest, false, auto, -gpu-vulkan, noble, 2404) (push) Waiting to run Details build container images / gh-runner (nvcr.io/nvidia/l4t-jetpack:r36.4.0, cublas, 12, 0, --jobs=4 --output-sync=target, linux/arm64, ubuntu-24.04-arm, true, auto, -nvidia-l4t-arm64, jammy, 2204) (push) Waiting to run Details build container images / gh-runner (ubuntu:24.04, cublas, 13, 0, --jobs=4 --output-sync=target, linux/arm64, ubuntu-24.04-arm, false, auto, -nvidia-l4t-arm64-cuda-13, noble, 2404) (push) Waiting to run Details Security Scan / tests (push) Waiting to run Details Tests extras backends / detect-changes (push) Waiting to run Details Tests extras backends / tests-transformers (push) Blocked by required conditions Details Tests extras backends / tests-rerankers (push) Blocked by required conditions Details Tests extras backends / tests-diffusers (push) Blocked by required conditions Details Tests extras backends / tests-coqui (push) Blocked by required conditions Details Tests extras backends / tests-moonshine (push) Blocked by required conditions Details Tests extras backends / tests-pocket-tts (push) Blocked by required conditions Details Tests extras backends / tests-qwen-tts (push) Blocked by required conditions Details Tests extras backends / tests-qwen-asr (push) Blocked by required conditions Details Tests extras backends / tests-nemo (push) Blocked by required conditions Details Tests extras backends / tests-voxcpm (push) Blocked by required conditions Details Tests extras backends / tests-llama-cpp-quantization (push) Blocked by required conditions Details Tests extras backends / tests-llama-cpp-grpc (push) Blocked by required conditions Details Tests extras backends / tests-llama-cpp-grpc-transcription (push) Blocked by required conditions Details Tests extras backends / tests-ik-llama-cpp-grpc (push) Blocked by required conditions Details Tests extras backends / tests-turboquant-grpc (push) Blocked by required conditions Details Tests extras backends / tests-acestep-cpp (push) Blocked by required conditions Details Tests extras backends / tests-qwen3-tts-cpp (push) Blocked by required conditions Details Tests extras backends / tests-voxtral (push) Blocked by required conditions Details Tests extras backends / tests-kokoros (push) Blocked by required conditions Details tests / tests-linux (1.26.x) (push) Waiting to run Details E2E Backend Tests / tests-e2e-backend (1.25.x) (push) Waiting to run Details UI E2E Tests / tests-ui-e2e (1.26.x) (push) Waiting to run Details ⬆️ Update ikawrakow/ik_llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-04-18 09:21:32 +02:00
LocalAI [bot]	86c673fd94	chore: ⬆️ Update ggml-org/whisper.cpp to `166c20b473d5f4d04052e699f992f625ea2a2fdd` (#9403 ) Some checks are pending UI E2E Tests / tests-ui-e2e (1.26.x) (push) Waiting to run Details build container images / core-image-build (ubuntu:24.04, cublas, 12, 8, --jobs=4 --output-sync=target, linux/amd64, ubuntu-latest, false, auto, -gpu-nvidia-cuda-12, noble, 2404) (push) Waiting to run Details build container images / core-image-build (ubuntu:24.04, vulkan, --jobs=4 --output-sync=target, linux/amd64,linux/arm64, ubuntu-latest, false, auto, -gpu-vulkan, noble, 2404) (push) Waiting to run Details build container images / gh-runner (nvcr.io/nvidia/l4t-jetpack:r36.4.0, cublas, 12, 0, --jobs=4 --output-sync=target, linux/arm64, ubuntu-24.04-arm, true, auto, -nvidia-l4t-arm64, jammy, 2204) (push) Waiting to run Details build container images / gh-runner (ubuntu:24.04, cublas, 13, 0, --jobs=4 --output-sync=target, linux/arm64, ubuntu-24.04-arm, false, auto, -nvidia-l4t-arm64-cuda-13, noble, 2404) (push) Waiting to run Details Security Scan / tests (push) Waiting to run Details Tests extras backends / tests-ik-llama-cpp-grpc (push) Blocked by required conditions Details Tests extras backends / tests-turboquant-grpc (push) Blocked by required conditions Details Tests extras backends / detect-changes (push) Waiting to run Details Tests extras backends / tests-transformers (push) Blocked by required conditions Details Tests extras backends / tests-rerankers (push) Blocked by required conditions Details Tests extras backends / tests-diffusers (push) Blocked by required conditions Details Tests extras backends / tests-coqui (push) Blocked by required conditions Details Tests extras backends / tests-moonshine (push) Blocked by required conditions Details Tests extras backends / tests-pocket-tts (push) Blocked by required conditions Details Tests extras backends / tests-qwen-tts (push) Blocked by required conditions Details Tests extras backends / tests-qwen-asr (push) Blocked by required conditions Details Tests extras backends / tests-nemo (push) Blocked by required conditions Details Tests extras backends / tests-voxcpm (push) Blocked by required conditions Details Tests extras backends / tests-llama-cpp-quantization (push) Blocked by required conditions Details Tests extras backends / tests-llama-cpp-grpc (push) Blocked by required conditions Details Tests extras backends / tests-llama-cpp-grpc-transcription (push) Blocked by required conditions Details Tests extras backends / tests-acestep-cpp (push) Blocked by required conditions Details Tests extras backends / tests-qwen3-tts-cpp (push) Blocked by required conditions Details Tests extras backends / tests-voxtral (push) Blocked by required conditions Details Tests extras backends / tests-kokoros (push) Blocked by required conditions Details tests / tests-linux (1.26.x) (push) Waiting to run Details tests / tests-e2e-container (push) Waiting to run Details tests / tests-apple (1.26.x) (push) Waiting to run Details E2E Backend Tests / tests-e2e-backend (1.25.x) (push) Waiting to run Details ⬆️ Update ggml-org/whisper.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-04-18 00:42:32 +02:00
Ettore Di Giacinto	c49feb546f	fix(llama-cpp): rename linked target common -> llama-common (#9408 ) Upstream llama.cpp (45cac7ca) renamed the CMake library target `common` to `llama-common`. Linking the old name caused `target_include_directories(... PUBLIC .)` from the common/ dir to not propagate, so `#include "common.h"` failed when building grpc-server.	2026-04-18 00:42:05 +02:00
LocalAI [bot]	844b0b760b	chore(model gallery): 🤖 add 1 new models via gallery agent (#9400 ) chore(model gallery): 🤖 add new models via gallery agent Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-04-17 17:56:41 +02:00
LocalAI [bot]	55c05211d3	chore(model gallery): 🤖 add 1 new models via gallery agent (#9399 ) chore(model gallery): 🤖 add new models via gallery agent Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-04-17 16:10:02 +02:00
Ettore Di Giacinto	a90a8cf1d0	fix(ci): switch gallery-agent to sigs.k8s.io/yaml (#9397 ) The gallery-agent lives under .github/, which Go tooling treats as a hidden directory and excludes from './...' expansion. That means 'go mod tidy' (run on every dependabot dependency bump) repeatedly strips github.com/ghodss/yaml from go.mod/go.sum, breaking 'go run ./.github/gallery-agent' with a missing go.sum entry error. Switch to sigs.k8s.io/yaml — API-compatible with ghodss/yaml and already pulled in as a transitive dependency via non-hidden packages, so tidy can no longer remove it.	2026-04-17 10:10:42 +02:00
dependabot[bot]	12b069f9bd	chore(deps): bump dompurify from 3.3.2 to 3.4.0 in /core/http/react-ui in the npm_and_yarn group across 1 directory (#9376 ) chore(deps): bump dompurify Bumps the npm_and_yarn group with 1 update in the /core/http/react-ui directory: [dompurify](https://github.com/cure53/DOMPurify). Updates `dompurify` from 3.3.2 to 3.4.0 - [Release notes](https://github.com/cure53/DOMPurify/releases) - [Commits](https://github.com/cure53/DOMPurify/compare/3.3.2...3.4.0) --- updated-dependencies: - dependency-name: dompurify dependency-version: 3.4.0 dependency-type: direct:production dependency-group: npm_and_yarn ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-04-17 09:06:32 +02:00
github-actions[bot]	48e87db400	chore: bump inference defaults from unsloth (#9396 ) Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-04-17 09:05:55 +02:00
LocalAI [bot]	7dbd9c056a	chore: ⬆️ Update ggml-org/llama.cpp to `4fbdabdc61c04d1262b581e1b8c0c3b119f688ff` (#9381 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-04-17 08:13:04 +02:00
Ettore Di Giacinto	7c5d6162f7	fix(ui): rename model config files on save to prevent duplicates (#9388 ) Editing a model's YAML and changing the `name:` field previously wrote the new body to the original `<oldName>.yaml`. On reload the config loader indexed that file under the new name while the old key lingered in memory, producing two entries in the system UI that shared a single underlying file — deleting either removed both. Detect the rename in EditModelEndpoint and rename the on-disk `<name>.yaml` and `._gallery_<name>.yaml` to match, drop the stale in-memory key before the reload, and redirect the editor URL in the React UI so it tracks the new name. Reject conflicts (409) and names containing path separators (400). Fixes #9294	2026-04-17 08:12:48 +02:00
Ettore Di Giacinto	5837b14888	chore: ⬆️ Update TheTom/llama-cpp-turboquant to `45f8a066ed5f5bb38c695cec532f6cef9f4efa9d' (#9385 ) chore: ⬆️ Update TheTom/llama-cpp-turboquant to `45f8a066ed5f5bb38c695cec532f6cef9f4efa9d` Drop 0002-ggml-rpc-bump-op-count-to-97.patch; the fork now has GGML_OP_COUNT == 97 and RPC_PROTO_PATCH_VERSION 2 upstream. Fetch all tags in backend/cpp/llama-cpp/Makefile so tag-only commits (the new turboquant pin is reachable only through the tag feature-turboquant-kv-cache-b8821-45f8a06) can be checked out.	2026-04-17 08:12:21 +02:00
LocalAI [bot]	b6a68e5df4	chore: ⬆️ Update leejet/stable-diffusion.cpp to `a564fdf642780d1df123f1c413b19961375b8346` (#9383 ) ⬆️ Update leejet/stable-diffusion.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-04-17 08:11:55 +02:00
LocalAI [bot]	c6dfb4acaf	chore: ⬆️ Update ikawrakow/ik_llama.cpp to `eaf83865a132f66e8f49efe0e78491625942f068` (#9382 ) ⬆️ Update ikawrakow/ik_llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-04-17 08:11:41 +02:00
LocalAI [bot]	ec5935421c	chore(model-gallery): ⬆️ update checksum (#9384 ) ⬆️ Checksum updates in gallery/index.yaml Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-04-16 22:41:52 +02:00
Ettore Di Giacinto	a0cbc46be9	refactor(tinygrad): reuse tinygrad.apps.llm instead of vendored Transformer (#9380 ) Drop the 295-line vendor/llama.py fork in favor of `tinygrad.apps.llm`, which now provides the Transformer blocks, GGUF loader (incl. Q4/Q6/Q8 quantization), KV-cache and generate loop we were maintaining ourselves. What changed: - New vendor/appsllm_adapter.py (~90 LOC) — HF -> GGUF-native state-dict keymap, Transformer kwargs builder, `_embed_hidden` helper, and a hard rejection of qkv_bias models (Qwen2 / 2.5 are no longer supported; the apps.llm Transformer ties `bias=False` on Q/K/V projections). - backend.py routes both safetensors and GGUF paths through apps.llm.Transformer. Generation now delegates to its (greedy-only) `generate()`; Temperature / TopK / TopP / RepetitionPenalty are still accepted on the wire but ignored — documented in the module docstring. - Jinja chat render now passes `enable_thinking=False` so Qwen3's reasoning preamble doesn't eat the tool-call token budget on small models. - Embedding path uses `_embed_hidden` (block stack + output_norm) rather than the custom `embed()` method we were carrying on the vendored Transformer. - test.py gains TestAppsLLMAdapter covering the keymap rename, tied embedding fallback, unknown-key skipping, and qkv_bias rejection. - Makefile fixtures move from Qwen/Qwen2.5-0.5B-Instruct to Qwen/Qwen3-0.6B (apps.llm-compatible) and tool_parser from qwen3_xml to hermes (the HF chat template emits hermes-style JSON tool calls). Verified with the docker-backed targets: test-extra-backend-tinygrad 5/5 PASS test-extra-backend-tinygrad-embeddings 3/3 PASS test-extra-backend-tinygrad-whisper 4/4 PASS test-extra-backend-tinygrad-sd 3/3 PASS	2026-04-16 22:41:18 +02:00
Ettore Di Giacinto	b4e30692a2	feat(backends): add sglang (#9359 ) * feat(backends): add sglang Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(sglang): force AVX-512 CXXFLAGS and disable CI e2e job sgl-kernel's shm.cpp uses __m512 AVX-512 intrinsics unconditionally; -march=native fails on CI runners without AVX-512 in /proc/cpuinfo. Force -march=sapphirerapids so the build always succeeds, matching sglang upstream's docker/xeon.Dockerfile recipe. The resulting binary still requires an AVX-512 capable CPU at runtime, so disable tests-sglang-grpc in test-extra.yml for the same reason tests-vllm-grpc is disabled. Local runs with make test-extra-backend-sglang still work on hosts with the right SIMD baseline. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(sglang): patch CMakeLists.txt instead of CXXFLAGS for AVX-512 CXXFLAGS with -march=sapphirerapids was being overridden by add_compile_options(-march=native) in sglang's CPU CMakeLists.txt, since CMake appends those flags after CXXFLAGS. Sed-patch the CMakeLists.txt directly after cloning to replace -march=native. --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-04-16 22:40:56 +02:00
Ettore Di Giacinto	61d34ccb11	fix(ui): show also concrete backends in the backend list Signed-off-by: Ettore Di Giacinto <mudler@localai.io>	2026-04-16 17:44:25 +00:00
LocalAI [bot]	7f88a3ba30	chore: ⬆️ Update leejet/stable-diffusion.cpp to `c41c5ded7af85e01b7fe442ff7950c720706d53a` (#9366 ) ⬆️ Update leejet/stable-diffusion.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-04-16 09:04:33 +02:00
Matt Van Horn	c4f309388e	fix(gallery): correct gemma-4 model URIs returning 404 (#9379 ) The gemma-4-26b-a4b-it, gemma-4-e2b-it, and gemma-4-e4b-it gallery entries pointed at files that do not exist on HuggingFace, so LocalAI fails with 404 when users try to install them. Two issues per entry: - mmproj filename uses the 'f16' quantization suffix, but ggml-org publishes the mmproj projectors as 'bf16'. - The e2b and e4b URIs hardcode lowercase 'e2b'/'e4b' in the filename component. HuggingFace file paths are case-sensitive and the real files use uppercase 'E2B'/'E4B'. Updated filename, uri, sha256, and the top-level 'mmproj' and 'parameters.model' references so every entry points at a real file and the declared hashes match the content. Verified each URI resolves (HTTP 302) and each sha256 matches the 'x-linked-etag' header returned by HuggingFace. Signed-off-by: Matt Van Horn <mvanhorn@gmail.com>	2026-04-16 08:51:20 +02:00
dependabot[bot]	ab326a9c61	chore(deps): bump the npm_and_yarn group across 1 directory with 6 updates (#9373 ) Bumps the npm_and_yarn group with 6 updates in the /core/http/react-ui directory: \| Package \| From \| To \| \| --- \| --- \| --- \| \| [vite](https://github.com/vitejs/vite/tree/HEAD/packages/vite) \| `6.4.1` \| `6.4.2` \| \| [@hono/node-server](https://github.com/honojs/node-server) \| `1.19.11` \| `1.19.14` \| \| [flatted](https://github.com/WebReflection/flatted) \| `3.3.4` \| `3.4.2` \| \| [hono](https://github.com/honojs/hono) \| `4.12.7` \| `4.12.14` \| \| [path-to-regexp](https://github.com/pillarjs/path-to-regexp) \| `8.3.0` \| `8.4.2` \| \| [picomatch](https://github.com/micromatch/picomatch) \| `4.0.3` \| `4.0.4` \| Updates `vite` from 6.4.1 to 6.4.2 - [Release notes](https://github.com/vitejs/vite/releases) - [Changelog](https://github.com/vitejs/vite/blob/v6.4.2/packages/vite/CHANGELOG.md) - [Commits](https://github.com/vitejs/vite/commits/v6.4.2/packages/vite) Updates `@hono/node-server` from 1.19.11 to 1.19.14 - [Release notes](https://github.com/honojs/node-server/releases) - [Commits](https://github.com/honojs/node-server/compare/v1.19.11...v1.19.14) Updates `flatted` from 3.3.4 to 3.4.2 - [Commits](https://github.com/WebReflection/flatted/compare/v3.3.4...v3.4.2) Updates `hono` from 4.12.7 to 4.12.14 - [Release notes](https://github.com/honojs/hono/releases) - [Commits](https://github.com/honojs/hono/compare/v4.12.7...v4.12.14) Updates `path-to-regexp` from 8.3.0 to 8.4.2 - [Release notes](https://github.com/pillarjs/path-to-regexp/releases) - [Changelog](https://github.com/pillarjs/path-to-regexp/blob/master/History.md) - [Commits](https://github.com/pillarjs/path-to-regexp/compare/v8.3.0...v8.4.2) Updates `picomatch` from 4.0.3 to 4.0.4 - [Release notes](https://github.com/micromatch/picomatch/releases) - [Changelog](https://github.com/micromatch/picomatch/blob/master/CHANGELOG.md) - [Commits](https://github.com/micromatch/picomatch/compare/4.0.3...4.0.4) --- updated-dependencies: - dependency-name: vite dependency-version: 6.4.2 dependency-type: direct:development dependency-group: npm_and_yarn - dependency-name: "@hono/node-server" dependency-version: 1.19.14 dependency-type: indirect dependency-group: npm_and_yarn - dependency-name: flatted dependency-version: 3.4.2 dependency-type: indirect dependency-group: npm_and_yarn - dependency-name: hono dependency-version: 4.12.14 dependency-type: indirect dependency-group: npm_and_yarn - dependency-name: path-to-regexp dependency-version: 8.4.2 dependency-type: indirect dependency-group: npm_and_yarn - dependency-name: picomatch dependency-version: 4.0.4 dependency-type: indirect dependency-group: npm_and_yarn ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-04-16 08:23:03 +02:00
LocalAI [bot]	df2d25cee5	chore: ⬆️ Update ikawrakow/ik_llama.cpp to `1163af96cf6bb4a4b819f998f84c153a49768b99` (#9368 ) ⬆️ Update ikawrakow/ik_llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-04-16 01:13:08 +02:00
LocalAI [bot]	96cd561d9d	chore: ⬆️ Update ggml-org/llama.cpp to `b3d758750a268bf93f084ccfa3060fb9a203192a` (#9370 ) ⬆️ Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>	2026-04-16 01:12:39 +02:00

1 2 3 4 5 ...

6083 commits