mirror of
https://github.com/mudler/LocalAI
synced 2026-05-24 09:28:23 +00:00
13 commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
61bf34ea2f
|
fix(traces): cap captured body size to keep admin Traces UI responsive (#9946)
Some checks are pending
Tests extras backends / tests-rerankers (push) Blocked by required conditions
Tests extras backends / tests-diffusers (push) Blocked by required conditions
Tests extras backends / tests-coqui (push) Blocked by required conditions
Tests extras backends / tests-moonshine (push) Blocked by required conditions
Tests extras backends / tests-pocket-tts (push) Blocked by required conditions
Tests extras backends / tests-qwen-tts (push) Blocked by required conditions
Tests extras backends / tests-sherpa-onnx-grpc-transcription (push) Blocked by required conditions
Tests extras backends / tests-qwen3-tts-cpp (push) Blocked by required conditions
Tests extras backends / tests-vibevoice-cpp-grpc-transcription (push) Blocked by required conditions
Tests extras backends / tests-localvqe-grpc-transform (push) Blocked by required conditions
Tests extras backends / tests-insightface-grpc (push) Blocked by required conditions
tests-aio / tests-aio (push) Waiting to run
E2E Backend Tests / tests-e2e-backend (1.25.x) (push) Waiting to run
UI E2E Tests / tests-ui-e2e (1.26.x) (push) Waiting to run
Tests extras backends / tests-qwen-asr (push) Blocked by required conditions
Tests extras backends / tests-nemo (push) Blocked by required conditions
Tests extras backends / tests-voxcpm (push) Blocked by required conditions
Tests extras backends / tests-liquid-audio (push) Blocked by required conditions
Tests extras backends / tests-llama-cpp-quantization (push) Blocked by required conditions
Tests extras backends / tests-llama-cpp-grpc (push) Blocked by required conditions
Tests extras backends / tests-llama-cpp-grpc-transcription (push) Blocked by required conditions
Tests extras backends / tests-llama-cpp-smoke (push) Waiting to run
Tests extras backends / tests-sherpa-onnx-realtime (push) Blocked by required conditions
Tests extras backends / tests-vibevoice-cpp (push) Blocked by required conditions
Tests extras backends / tests-vibevoice-cpp-grpc-tts (push) Blocked by required conditions
Tests extras backends / tests-voxtral (push) Blocked by required conditions
Tests extras backends / tests-kokoros (push) Blocked by required conditions
Tests extras backends / tests-speaker-recognition-grpc (push) Blocked by required conditions
tests / tests-linux (1.26.x) (push) Waiting to run
tests / tests-apple (1.26.x) (push) Waiting to run
The trace middleware buffered the full request and response bodies for every JSON exchange. With a chatty agent-pool RAG workload, /embeddings responses (large vector arrays) accumulated to tens of MB in the in-memory buffer; the admin Traces page would then download and parse 40+ MB on every load and on every 5s auto-refresh, locking the UI in a loading state. Add LOCALAI_TRACING_MAX_BODY_BYTES (default 64 KiB) that caps each captured body. The full payload still flows through to the real client; only the trace copy is bounded. Exchanges record body_truncated and original body_bytes so the dashboard can show that truncation happened. The cap is configurable via env, CLI, and runtime_settings.json. Also unblock recovery: the Traces page now keeps the Clear button enabled while loading, since "buffer too large to render" is exactly when the user needs to clear it. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io> |
||
|
|
b1a99436c7
|
feat(branding): admin-configurable instance name, tagline, and assets (#9635)
Adds a whitelabeling feature so an operator can replace the LocalAI
instance name, tagline, square logo, horizontal logo, and favicon from
the admin Settings page. Defaults fall back to the bundled assets so
existing installs are unaffected.
The public GET /api/branding endpoint is reachable pre-auth so the
login screen can render the configured branding before sign-in.
Mutating routes (POST/DELETE /api/branding/asset/:kind) remain
admin-only. Text fields (instance_name, instance_tagline) ride the
existing /api/settings flow; binary assets get a dedicated multipart
upload route that persists files under DynamicConfigsDir/branding/.
To prevent the Settings page's stale local state from clobbering an
upload on save, UpdateSettingsEndpoint preserves whatever the on-disk
asset filename fields are regardless of the body — /api/branding/asset/*
are the sole writers for those fields.
The MCP catalog gains get_branding and set_branding tools (text fields
only; file upload stays UI-only) plus a configure_branding skill prompt.
While wiring this up, the same restart-loss class of bug surfaced for
several existing fields whose RuntimeSettings entries were never read
by the startup loader. Fix loadRuntimeSettingsFromFile() to load:
- branding (instance_name, instance_tagline, *_file basenames)
- auto_upgrade_backends, prefer_development_backends
- localai_assistant_enabled
- open_responses_store_ttl
- the 7 existing AgentPool fields (enabled, default/embedding model,
chunking sizes, enable_logs, collection_db_path)
Also exposes 3 new AgentPool runtime settings (vector_engine,
database_url, agent_hub_url) via /api/settings + the Settings UI, with
the same load-on-startup wiring. The file watcher's manual-edit path
is intentionally not changed — the in-process API endpoints already
update appConfig directly, so the watcher is redundant for supported
flows and a separate refactor for everything else.
15 TDD specs cover the loader behaviour (1 branding + 11 adjacent + 3
new agent-pool); 2 specs cover the persistence helpers and the
clobber-prevention contract.
Assisted-by: claude-code:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
|
||
|
|
bcef72b9c1
|
feat: localai assistant chat modality (#9602)
* fix(tests): inline model_test fixtures after tests/models_fixtures removal
The previous reorg removed tests/models_fixtures/ but core/config/model_test.go
still read CONFIG_FILE/MODELS_PATH env vars pointing into that directory, so
`make test` failed with "open : no such file or directory" on the readConfigFile
spec (the suite ran with --fail-fast and bailed before openresponses_test).
Inline the YAMLs (config/embeddings/grpc/rwkv/whisper) directly into the test
file, materialise them into a per-test tmpdir via BeforeEach, and drop the
env-var lookups. The test no longer depends on Makefile plumbing.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: claude-code:claude-opus-4-7 [Edit] [Write] [Bash]
* refactor(modeladmin): extract model-admin helpers into a service package
Lift the bodies of EditModelEndpoint, PatchConfigEndpoint,
ToggleStateModelEndpoint, TogglePinnedModelEndpoint and
VRAMEstimateEndpoint into core/services/modeladmin so the same logic can
be called by non-HTTP clients (notably the in-process MCP server that
backs the LocalAI Assistant chat modality, landing in a follow-up commit).
The HTTP handlers shrink to thin shells that parse echo inputs, call the
matching helper, map typed errors (ErrNotFound, ErrConflict,
ErrPathNotTrusted, ErrBadAction, ...) to the existing HTTP status codes,
and render the existing response shapes. No REST-surface behaviour change;
the existing localai endpoint tests cover the regression net.
Adds focused unit tests for each helper against tmp-dir-backed
ModelConfigLoader fixtures (deep-merge patch, rename + conflict, path
separator guard, toggle/pin enable/disable, sync callback).
Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Write] [Bash]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(assistant): LocalAI Assistant chat modality with in-memory MCP server
Adds a chat modality, admin-only, that wires the chat session to an
in-memory MCP server exposing LocalAI's own admin/management surface as
tools. An admin can install models, manage backends, edit configs and
check status by chatting; the LLM calls tools like gallery_search,
install_model, import_model_uri, list_installed_models, edit_model_config
and surfaces the results.
Same Go package powers two modes:
pkg/mcp/localaitools/
NewServer(client, opts) builds an MCP server that registers the
19-tool admin catalog. The LocalAIClient interface has two impls:
- inproc.Client — calls services directly (no HTTP loopback,
no synthetic admin API key). Used in-process by the chat handler.
- httpapi.Client — calls the LocalAI REST API. Used by the new
`local-ai mcp-server --target=…` subcommand to control a remote
LocalAI from a stdio MCP host.
Tools and their embedded skill prompts are agnostic to which client
backs them. Skill prompts are markdown files under prompts/, embedded
via go:embed and assembled into the system prompt at server init.
Wiring:
- core/http/endpoints/mcp/localai_assistant.go — process-wide holder
that spins up the in-memory MCP server once at Application start
using paired net.Pipe transports, then reuses LocalToolExecutor
(no fork) for every chat request that opts in.
- core/http/endpoints/openai/chat.go — small branch ahead of the
existing MCP block: when metadata.localai_assistant=true,
defense-in-depth admin check + executor swap + system-prompt
injection. All downstream tool dispatch is unchanged.
- core/http/auth/{permissions,features}.go — adds
FeatureLocalAIAssistant; gating happens at the chat handler entry
plus admin-only `/api/settings`.
- core/cli/{run.go,cli.go,mcp_server.go} —
LOCALAI_DISABLE_ASSISTANT flag (runtime-toggleable via Settings, no
restart), plus `local-ai mcp-server` stdio subcommand.
- core/config/runtime_settings.go — `localai_assistant_enabled`
runtime setting; the chat handler reads `DisableLocalAIAssistant`
live at request entry.
UI:
- Home.jsx — prominent self-explanatory CTA card on first run
("Manage LocalAI by chatting"); collapses to a compact
"Manage by chat" button in the quick-links row once used,
persisted via localStorage.
- Chat.jsx — admin-only "Manage" toggle in the chat header,
"Manage mode" badge, dedicated empty-state copy, starter chips.
- Settings.jsx — "LocalAI Assistant" section with the runtime
enable toggle.
- useChat.js — `localaiAssistant` flag on the chat schema; injects
`metadata.localai_assistant=true` on requests when active.
Distributed mode: the in-memory MCP server lives only on the head node;
inproc.Client wraps already-distributed-aware services so installs
propagate to workers via the existing GalleryService machinery.
Documentation: `.agents/localai-assistant-mcp.md` is the contributor
contract — when adding an admin REST endpoint, also add a LocalAIClient
method, an inproc + httpapi impl, a tool registration, and a skill
prompt update; the AGENTS.md index links to it.
Out of scope (follow-ups): per-tool RBAC granularity for non-admin
read-only access; streaming mcp_tool_progress for long installs;
React Vitest rig for the UI changes.
Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Write] [Bash]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* refactor(assistant): extract tool/capability/MiB/server-name constants
The MCP tool surface, capability tag set, server-name default, and the
chat-handler metadata key were repeated as bare string literals across
seven files. Renaming any one required hand-editing every call site and
risked code/test/prompt drift.
This pulls them into typed constants:
- pkg/mcp/localaitools/tools.go — Tool* constants for the 19 MCP tools,
plus DefaultServerName.
- pkg/mcp/localaitools/capability.go — typed Capability + constants for
the capability tag set the LLM passes to list_installed_models. The
type rides through LocalAIClient.ListInstalledModels and replaces the
triplet of "embed"/"embedding"/"embeddings" with the single
CapabilityEmbeddings.
- pkg/mcp/localaitools/inproc/client.go — bytesPerMiB constant for the
VRAMEstimate byte→MB conversion.
- core/http/endpoints/mcp/tools.go — MetadataKeyLocalAIAssistant for the
"localai_assistant" request-metadata key consumed by the chat handler.
Tool registrations, the test catalog, the dispatch table, the validation
fixtures, and the fake/stub clients all reference the constants. The
embedded skill prompts under prompts/ keep their bare strings (go:embed
markdown can't import Go constants); the existing TestPromptsContain
SafetyAnchors guards the alignment.
No behaviour change. All tests pass with -race.
Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Write] [Bash]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* refactor(modeladmin): typed Action for ToggleState/TogglePinned
The toggle/pin verbs were bare strings everywhere — handler signatures,
service implementations, MCP tool args, the fake/stub clients, the
inproc and httpapi LocalAIClient impls, plus 4 test files. A typo in
any caller silently fell through to the runtime "must be 'enable' or
'disable'" check.
Introduce core/services/modeladmin.Action (string alias) with
ActionEnable, ActionDisable, ActionPin, ActionUnpin and a small Valid
helper. The compiler now catches mismatches at every boundary; renames
ripple through one source of truth.
LocalAIClient.ToggleModelState/Pinned signatures change to take
modeladmin.Action. The package is brand-new and unreleased so this is
a free public-API tightening.
Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Bash]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix(assistant): respect ctx cancellation on gallery channel sends
InstallModel, DeleteModel, ImportModelURI, InstallBackend and
UpgradeBackend all pushed onto galleryop channels with bare sends. If the
worker was paused or the buffer full, the chat-handler goroutine blocked
forever — the LLM kept polling and the request leaked.
Wrap the five sends in a sendModelOp/sendBackendOp helper that selects
on ctx.Done() so a cancelled chat completion surfaces context.Canceled
back to the LLM instead of hanging.
Adds inproc/client_test.go with a pre-cancelled-ctx regression test on
InstallModel; the helpers are shared so the same guarantee covers the
other four call sites.
Assisted-by: Claude:claude-opus-4-7 [Edit] [Write] [Bash]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix(assistant): graceful shutdown for in-memory holder and stdio CLI
Two related leaks:
- Application.start() built the LocalAIAssistantHolder but never wired
Close() into the graceful-termination chain — the in-memory MCP
transport pair stayed alive until process exit, and the goroutines
behind net.Pipe() didn't drain. Hook into the existing
signals.RegisterGracefulTerminationHandler chain (same pattern as
core/http/endpoints/mcp/tools.go:770).
- core/cli/mcp_server.go ran srv.Run with context.Background(); a
Ctrl-C from the host (Claude Desktop, mcphost, npx inspector) or a
SIGTERM from process supervision left the stdio loop reading from a
closed pipe. Switch to signal.NotifyContext to surface the signal
through ctx and let srv.Run drain.
Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Bash]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix(assistant): typed HTTPError + propagate prompt walk error
The httpapi client detected "no such job" by substring-matching on the
error string ("404", "could not find") — brittle to status-code
formatting changes and to LocalAI fixing /models/jobs/:uuid to return a
proper 404. Replace with a typed *HTTPError whose Is() method honours
errors.Is(err, ErrHTTPNotFound). The 500-with-"could not find" branch
stays as a transitional fallback documented in Is().
Same change covers ListNodes' 404 fallback for the /api/nodes endpoint.
Adds httptest tests for both 404 and the legacy 500 path, plus a
direct errors.Is exposure test so external callers (the standalone
stdio CLI host) can match without re-string-parsing.
Also tightens prompts.SystemPrompt: panic when fs.WalkDir on the
embedded FS fails. The only realistic cause is a build-time //go:embed
misconfiguration; serving an empty system prompt to the LLM is much
worse than crashing init. TestSystemPromptIncludesAllEmbeddedFiles
catches regressions in CI.
Assisted-by: Claude:claude-opus-4-7 [Edit] [Write] [Bash]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix(modeladmin): atomic writes for model config files
The five sites that wrote model YAML used os.WriteFile, which opens
with O_TRUNC|O_WRONLY|O_CREATE. A crash mid-write left the destination
truncated and the model unloadable until manual repair. Pre-existing
behaviour inherited from the original endpoint handlers — fix once now
that there's a single helper.
Adds writeFileAtomic: writes to a sibling temp file, chmods, syncs via
Close(), then os.Rename. Same-directory temp keeps the rename atomic on
the same filesystem; cleanup runs on every error path so stray temps
don't accumulate. No new dependency.
Applied to:
- ConfigService.PatchConfig
- ConfigService.EditYAML (both rename and in-place branches)
- mutateYAMLBoolFlag (drives ToggleState + TogglePinned)
atomic_test.go covers the happy path plus a read-only-dir failure case
that asserts the original file is preserved (skipped on Windows where
the chmod trick is POSIX-specific).
Assisted-by: Claude:claude-opus-4-7 [Edit] [Write] [Bash]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore(assistant): prune dead code, mark stub, document conventions
Three small cleanups landing together:
- Drop the unused errNotImplemented sentinel from inproc/client.go.
All five methods that used to return it are wired to modeladmin
helpers since the Phase B commit; the package var is dead.
- Annotate httpapi.Client.GetModelConfig as a known stub. LocalAI's
/models/edit/:name returns rendered HTML, not JSON, so the standalone
CLI's get_model_config tool surfaces a clear error to the LLM. A
future JSON-only /api/models/config-yaml/:name endpoint is tracked in
the agent contract; FIXME points at it.
- Extend `.agents/localai-assistant-mcp.md` with a "Code conventions"
section that documents the audit-driven rules: tool/Capability/Action
constants, errors.Is over substring matching, ctx-aware channel
sends, atomic writes, and graceful shutdown. Refresh the file map so
it lists tools.go and capability.go and drops the removed
tools_bootstrap.go.
The tools_models.go diff is a comment-only change explaining why the
ModelName empty-string check stays at the tool layer (consistency
across LocalAIClient implementations, since the SDK schema validator
only enforces presence, not non-empty).
Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Bash]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* test(assistant): convert test files to ginkgo + gomega
The repo convention (per core/http/endpoints/localai/*_test.go,
core/gallery/**, etc.) is Ginkgo v2 with Gomega assertions. The tests I
introduced for the assistant feature used vanilla testing.T, which made
them stand out and stripped the BDD structure the rest of the suite
relies on.
Convert every test file in the assistant scope to Ginkgo:
pkg/mcp/localaitools/
dto_test.go — Describe("DTOs round-trip through JSON")
prompts_test.go — Describe("SystemPrompt assembler")
server_test.go — Describe("Server tool catalog"),
Describe("Tool dispatch"),
Describe("Tool error surfacing"),
Describe("Argument validation"),
Describe("Concurrent tool calls")
parity_test.go — Describe("LocalAIClient parity"),
hosts the suite's single RunSpecs (the file
is package localaitools_test so it can
import httpapi without an import cycle;
Ginkgo aggregates Describes from both the
internal and external test packages into
one run).
httpapi/client_test.go — Describe("httpapi.Client against the
LocalAI admin REST surface"),
Describe("ErrHTTPNotFound"),
Describe("Bearer token")
inproc/client_test.go — Describe("inproc.Client cancellation")
core/services/modeladmin/
config_test.go — Describe("ConfigService") with sub-Describes
for GetConfig, PatchConfig, EditYAML
state_test.go — Describe("ConfigService.ToggleState")
pinned_test.go — Describe("ConfigService.TogglePinned")
atomic_test.go — Describe("writeFileAtomic")
core/http/endpoints/mcp/
localai_assistant_test.go — Describe("LocalAIAssistantHolder")
Each package gets a `*_suite_test.go` with the standard
`RegisterFailHandler(Fail) + RunSpecs(t, "...")` boilerplate. Helpers
that previously took *testing.T (newTestService, writeModelYAML,
readMap, sortedStrings, sortGalleries, etc.) drop the *T receiver and
use Gomega Expectations directly. tmp dirs come from GinkgoT().TempDir().
No semantic change to test coverage — every original assertion has a
direct Gomega counterpart. All suites pass with -race.
Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Write] [Bash]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* test+docs(assistant): drift detector for Tool ↔ REST route mapping
Honest gap from the audit: the parity_test.go suite only checks four
methods, and uses the same httpapi.Client for both sides — it asserts
stability of the DTO shapes, not equivalence between in-process and
HTTP. If a contributor adds an admin REST endpoint without an MCP tool,
or a tool without a matching httpapi route, both surfaces silently
diverge.
Add a coverage test plus stronger docs:
- pkg/mcp/localaitools/coverage_test.go introduces a hand-maintained
toolToHTTPRoute map: every Tool* constant must list the REST endpoint
the httpapi.Client hits (or "(none)" with a documented reason). Two
Ginkgo specs assert the map and the published catalog stay in sync —
one fails when a Tool is added without a route entry, the other fails
when a route entry references a tool that no longer exists. Verified
by removing the ToolDeleteModel entry locally; the test fired with a
clear message pointing the contributor at the file.
Deliberate non-test: we don't enumerate live admin REST routes from
here. Walking the route registry requires booting Application;
parsing core/http/routes/localai.go is brittle. The "new admin REST
endpoint → MCP tool" direction stays a PR checklist item — see below.
- AGENTS.md gets a new Quick Reference bullet that calls out the rule
and points at the test by name.
- .agents/api-endpoints-and-auth.md tightens the existing "Companion:
MCP admin tool surface" subsection from "if useful, consider..." to
"MUST be considered, with three concrete outcomes (tool added,
deliberately skipped with documented reason, or forgot — which
breaks the contract)". Adds a checklist item at the bottom of the
file's authoritative checklist.
Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Write] [Bash]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* refactor(assistant): drop duplicate DTOs, surface canonical types
Audit feedback: localaitools/dto.go reinvented several types that already
existed in the codebase. Replace the duplicates with the canonical types
so the LLM-visible wire format stays aligned with the rest of LocalAI by
construction (no parallel structs to keep in sync).
Removed (and the canonical type now used by the LocalAIClient interface):
localaitools.Gallery → config.Gallery
localaitools.GalleryModelHit → gallery.Metadata
localaitools.VRAMEstimate → vram.EstimateResult
Tightened scope:
localaitools.Backend → kept, but reduced to {Name, Installed}.
ListKnownBackends now returns
[]schema.KnownBackend (the canonical
type already used by REST /backends/known).
Kept with documented rationale:
localaitools.JobStatus — galleryop.OpStatus has Error error which
marshals to "{}". JobStatus is the
JSON-friendly mirror.
localaitools.Node — nodes.BackendNode carries gorm internals
+ token hash; we expose only the
LLM-relevant fields.
ImportModelURIRequest/Response — schema.ImportModelRequest and
GalleryResponse are wire-shaped, mine
are LLM-shaped (BackendPreference flat,
AmbiguousBackend exposed).
Side wins:
- Drop bytesPerMiB; vram.EstimateResult already carries human-readable
display strings (size_display, vram_display) the LLM uses directly.
- Drop the handler-private vramEstimateRequest in
core/http/endpoints/localai/vram.go and bind directly into
modeladmin.VRAMRequest (now JSON-tagged).
Both clients pass through these types now where possible (e.g.
ListGalleries in inproc.Client is a one-liner returning
AppConfig.Galleries; httpapi.Client.GallerySearch decodes straight into
[]gallery.Metadata).
All tests green with -race.
Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Bash]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* refactor(assistant): extract REST route paths into named constants
httpapi.Client had 18 bare-string path sites scattered across methods.
Pull them into pkg/mcp/localaitools/httpapi/routes.go: static paths as
package-private constants, dynamic paths as small builders that handle
url.PathEscape on segment values.
No behaviour change. Drops the now-unused net/url import from client.go
since path escaping moved into routes.go alongside the path it applies to.
Local-only by design: the server-side registrations in
core/http/routes/localai.go remain bare strings. Sharing constants across
the pkg/ ↔ core/ boundary would invert the layering today; the existing
Tool↔REST drift-detector in coverage_test.go is the safety net for that
direction.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-7 [Claude Code]
* docs(assistant): align with shipped UI and dropped bootstrap env vars
The LocalAI Assistant doc still described the older iteration:
- The in-chat toggle was renamed from "Admin" to "Manage" (the badge is
now "Manage mode" and the home page exposes a "Manage by chat" CTA).
- LOCALAI_ASSISTANT_BOOTSTRAP_MODEL / --localai-assistant-bootstrap-model
and the bootstrap_default_model tool were removed — admins pick a model
from the existing selector instead, no env-var configuration required.
- The shipped tool catalog includes import_model_uri but didn't appear in
the doc; bootstrap_default_model appeared but no longer exists.
- The Settings → LocalAI Assistant runtime toggle wasn't mentioned as the
preferred way to disable without restart.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-7 [Claude Code]
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
|
||
|
|
2865f0f8d3
|
feat(ux): backend management enhancement (#9325)
* feat: add PreferDevelopmentBackends setting, expose isMeta/isDevelopment in API - Add PreferDevelopmentBackends config field, CLI flag, runtime setting - Add IsDevelopment() method to GalleryBackend - Use AvailableBackendsUnfiltered in UI API to show all backends - Expose isMeta, isDevelopment, preferDevelopmentBackends in backend API response * feat: upgrade banner with Upgrade All button, detect pre-existing backends - Add upgrade banner on Backends page showing count and Upgrade All button - Fix upgrade detection for backends installed before version tracking: flag as upgradeable when gallery has a version but installed has none - Fix OCI digest check to flag backends with no stored digest as upgradeable |
||
|
|
8ab0744458
|
feat: backend versioning, upgrade detection and auto-upgrade (#9315)
* feat: add backend versioning data model foundation Add Version, URI, and Digest fields to BackendMetadata for tracking installed backend versions and enabling upgrade detection. Add Version field to GalleryBackend. Add UpgradeAvailable/AvailableVersion fields to SystemBackend. Implement GetImageDigest() for lightweight OCI digest lookups via remote.Head. Record version, URI, and digest at install time in InstallBackend() and propagate version through meta backends. * feat: add backend upgrade detection and execution logic Add CheckBackendUpgrades() to compare installed backend versions/digests against gallery entries, and UpgradeBackend() to perform atomic upgrades with backup-based rollback on failure. Includes Agent A's data model changes (Version/URI/Digest fields, GetImageDigest). * feat: add AutoUpgradeBackends config and runtime settings Add configuration and runtime settings for backend auto-upgrade: - RuntimeSettings field for dynamic config via API/JSON - ApplicationConfig field, option func, and roundtrip conversion - CLI flag with LOCALAI_AUTO_UPGRADE_BACKENDS env var - Config file watcher support for runtime_settings.json - Tests for ToRuntimeSettings, ApplyRuntimeSettings, and roundtrip * feat(ui): add backend version display and upgrade support - Add upgrade check/trigger API endpoints to config and api module - Backends page: version badge, upgrade indicator, upgrade button - Manage page: version in metadata, context-aware upgrade/reinstall button - Settings page: auto-upgrade backends toggle * feat: add upgrade checker service, API endpoints, and CLI command - UpgradeChecker background service: checks every 6h, auto-upgrades when enabled - API endpoints: GET /backends/upgrades, POST /backends/upgrades/check, POST /backends/upgrade/:name - CLI: `localai backends upgrade` command, version display in `backends list` - BackendManager interface: add UpgradeBackend and CheckUpgrades methods - Wire upgrade op through GalleryService backend handler - Distributed mode: fan-out upgrade to worker nodes via NATS * fix: use advisory lock for upgrade checker in distributed mode In distributed mode with multiple frontend instances, use PostgreSQL advisory lock (KeyBackendUpgradeCheck) so only one instance runs periodic upgrade checks and auto-upgrades. Prevents duplicate upgrade operations across replicas. Standalone mode is unchanged (simple ticker loop). * test: add e2e tests for backend upgrade API - Test GET /api/backends/upgrades returns 200 (even with no upgrade checker) - Test POST /api/backends/upgrade/:name accepts request and returns job ID - Test full upgrade flow: trigger upgrade via API, wait for job completion, verify run.sh updated to v2 and metadata.json has version 2.0.0 - Test POST /api/backends/upgrades/check returns 200 - Fix nil check for applicationInstance in upgrade API routes |
||
|
|
8862e3ce60
|
feat: add node reconciler, allow to schedule to group of nodes, min/max autoscaler (#9186)
* always enable parallel requests Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat: add node reconciler, allow to schedule to group of nodes, min/max autoscaler Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * chore: move tests to ginkgo Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * chore(smart router): order by available vram Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> |
||
|
|
59108fbe32
|
feat: add distributed mode (#9124)
* feat: add distributed mode (experimental) Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix data races, mutexes, transactions Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactorings Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fixups Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix events and tool stream in agent chat Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * use ginkgo Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactoring and consolidation Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactoring and consolidation Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactoring and consolidation Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactoring and consolidation Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactoring and consolidation Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactoring and consolidation Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactoring and consolidation Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactoring and consolidation Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(cron): compute correctly time boundaries avoiding re-triggering Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * enhancements, refactorings Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * do not flood of healthy checks Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * do not list obvious backends as text backends Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * tests fixups Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactoring and consolidation Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Drop redundant healthcheck Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * enhancements, refactorings Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> |
||
|
|
35d509d8e7
|
feat(ui): Per model backend logs and various fixes (#9028)
* feat(gallery): Switch to expandable box instead of pop-over and display model files Signed-off-by: Richard Palethorpe <io@richiejp.com> * feat(ui, backends): Add individual backend logging Signed-off-by: Richard Palethorpe <io@richiejp.com> * fix(ui): Set the context settings from the model config Signed-off-by: Richard Palethorpe <io@richiejp.com> --------- Signed-off-by: Richard Palethorpe <io@richiejp.com> |
||
|
|
ac48867b7d
|
feat: add agentic management (#8820)
* feat: add standalone and agentic functionalities Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * expose agents via responses api Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> |
||
|
|
3387bfaee0
|
feat(api): add support for open responses specification (#8063)
* feat: openresponses Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Add ttl settings, fix tests Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix: register cors middleware by default Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * satisfy schema Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Logitbias and logprobs Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Add grammar Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * SSE compliance Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * tool JSON conversion Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * support background mode Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * swagger Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * drop code. This is handled in the handler Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Small refactorings Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * background mode for MCP Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> |
||
|
|
99b5c5f156
|
feat(api): Allow tracing of requests and responses (#7609)
* feat(api): Allow tracing of requests and responses Signed-off-by: Richard Palethorpe <io@richiejp.com> * feat(traces): Add traces UI Signed-off-by: Richard Palethorpe <io@richiejp.com> --------- Signed-off-by: Richard Palethorpe <io@richiejp.com> |
||
|
|
c844b7ac58
|
feat: disable force eviction (#7725)
* feat: allow to set forcing backends eviction while requests are in flight Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat: try to make the request sit and retry if eviction couldn't be done Otherwise calls that in order to pass would need to shutdown other backends would just fail. In this way instead we make the request sit and retry eviction until it succeeds. The thresholds can be configured by the user. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * add tests Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * expose settings to CLI Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Update docs Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> |
||
|
|
50f9c9a058
|
feat(watchdog): add Memory resource reclaimer (#7583)
* feat(watchdog): add GPU reclaimer Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Handle vram calculation for unified memory devices Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Support RAM eviction, set watchdog interval from runtime settings Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> |