LocalAI

mirror of https://github.com/mudler/LocalAI synced 2026-05-24 09:28:23 +00:00

History

Ettore Di Giacinto bbcaebc1ef feat(concurrency-groups): per-model exclusive groups for backend loading (#9662 ) * feat(concurrency-groups): per-model exclusive groups for backend loading Adds `concurrency_groups: [...]` to model YAML configs. Two models that share a group cannot be loaded concurrently on the same node — loading one evicts the others, reusing the existing pinned/busy/retry policy from LRU eviction. Layered design: - Watchdog (pkg/model): per-node correctness floor — on every Load(), evict any loaded model that shares a group with the requested one. Pinned skips surface NeedMore so the loader retries (and ultimately logs a clear warning), instead of silently allowing the rule to be violated. - Distributed scheduler (core/services/nodes): soft anti-affinity hint — scheduleNewModel prefers nodes that don't already host a same-group model, falling back to eviction only if every candidate has a conflict. Composes with NodeSelector at the same point in the candidate pipeline. Per-node, not cluster-wide: VRAM is a node-local resource, and two heavy models running on different nodes is fine. The ConfigLoader is wired into SmartRouter via a small ConcurrencyConflictResolver interface so the nodes package keeps a narrow surface on core/config. Refactors the inner LRU eviction body into a shared collectEvictionsLocked helper and the loader retry loop into retryEnforce(fn, maxRetries, interval), so both LRU and group enforcement share busy/pinned/retry semantics. Closes #9659. Assisted-by: Claude:claude-opus-4-7 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(watchdog): sync pinned + concurrency_groups at startup The startup-time watchdog setup lives in initializeWatchdog (startup.go), not in startWatchdog (watchdog.go). The latter is only invoked from the runtime-settings RestartWatchdog path. As a result, neither SyncPinnedModelsToWatchdog nor SyncModelGroupsToWatchdog ran at boot, so `pinned: true` and `concurrency_groups: [...]` only became effective after a settings-driven watchdog restart. Fix by adding both sync calls to initializeWatchdog. Confirmed end-to-end: loading model A in group "heavy", then C with no group (coexists), then B in group "heavy" now correctly evicts A and leaves [B, C]. Assisted-by: Claude:claude-opus-4-7 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(test): satisfy errcheck on new os.Remove in concurrency_groups spec CI lint runs new-from-merge-base, so the existing pre-existing `defer os.Remove(tmp.Name())` lines are baseline-grandfathered but the one introduced by the concurrency_groups YAML round-trip test is held to errcheck. Wrap the remove in a closure that discards the error. Assisted-by: Claude:claude-opus-4-7 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>		2026-05-05 08:42:50 +02:00
..
audio	feat: add distributed mode (#9124 )	2026-03-30 00:47:27 +02:00
concurrency	feat: add distributed mode (#9124 )	2026-03-30 00:47:27 +02:00
downloader	feat: add biometrics UI (#9524 )	2026-04-24 08:50:34 +02:00
functions	feat: add distributed mode (#9124 )	2026-03-30 00:47:27 +02:00
grpc	feat: add LocalVQE backend and audio transformations UI (#9640 )	2026-05-04 22:07:11 +02:00
huggingface-api	fix(importer): emit all shards for multi-part GGUF models (#9513 )	2026-04-23 15:00:02 +02:00
mcp/localaitools	feat(branding): admin-configurable instance name, tagline, and assets (#9635 )	2026-05-02 15:51:36 +02:00
model	feat(concurrency-groups): per-model exclusive groups for backend loading (#9662 )	2026-05-05 08:42:50 +02:00
oci	feat: backend versioning, upgrade detection and auto-upgrade (#9315 )	2026-04-11 22:31:15 +02:00
reasoning	fix(reasoning): suppress partial tag tokens during autoparser warm-up	2026-04-04 20:45:57 +00:00
sanitize	feat: add distributed mode (#9124 )	2026-03-30 00:47:27 +02:00
signals	feat: add distributed mode (#9124 )	2026-03-30 00:47:27 +02:00
sound	feat: add distributed mode (#9124 )	2026-03-30 00:47:27 +02:00
store	chore: fix go.mod module (#2635 )	2024-06-23 08:24:36 +00:00
system	feat(importer): expand importer flow to almost all backends (#9466 )	2026-04-22 22:42:37 +02:00
utils	feat: Add Sherpa ONNX backend for ASR and TTS (#8523 )	2026-04-24 14:40:06 +02:00
vram	feat(api): Allow coding agents to interactively discover how to control and configure LocalAI (#9084 )	2026-04-04 15:14:35 +02:00
xio	feat(ui): allow to cancel ops (#7264 )	2025-11-13 18:41:47 +01:00
xsync	chore: fix go.mod module (#2635 )	2024-06-23 08:24:36 +00:00
xsysinfo	fix(distributed): correct VRAM/RAM reporting on NVIDIA unified-memory hosts (#9545 )	2026-04-24 22:02:23 +02:00