LocalAI

mirror of https://github.com/mudler/LocalAI synced 2026-05-24 09:28:23 +00:00

History

Ettore Di Giacinto bbcaebc1ef feat(concurrency-groups): per-model exclusive groups for backend loading (#9662 ) * feat(concurrency-groups): per-model exclusive groups for backend loading Adds `concurrency_groups: [...]` to model YAML configs. Two models that share a group cannot be loaded concurrently on the same node — loading one evicts the others, reusing the existing pinned/busy/retry policy from LRU eviction. Layered design: - Watchdog (pkg/model): per-node correctness floor — on every Load(), evict any loaded model that shares a group with the requested one. Pinned skips surface NeedMore so the loader retries (and ultimately logs a clear warning), instead of silently allowing the rule to be violated. - Distributed scheduler (core/services/nodes): soft anti-affinity hint — scheduleNewModel prefers nodes that don't already host a same-group model, falling back to eviction only if every candidate has a conflict. Composes with NodeSelector at the same point in the candidate pipeline. Per-node, not cluster-wide: VRAM is a node-local resource, and two heavy models running on different nodes is fine. The ConfigLoader is wired into SmartRouter via a small ConcurrencyConflictResolver interface so the nodes package keeps a narrow surface on core/config. Refactors the inner LRU eviction body into a shared collectEvictionsLocked helper and the loader retry loop into retryEnforce(fn, maxRetries, interval), so both LRU and group enforcement share busy/pinned/retry semantics. Closes #9659. Assisted-by: Claude:claude-opus-4-7 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(watchdog): sync pinned + concurrency_groups at startup The startup-time watchdog setup lives in initializeWatchdog (startup.go), not in startWatchdog (watchdog.go). The latter is only invoked from the runtime-settings RestartWatchdog path. As a result, neither SyncPinnedModelsToWatchdog nor SyncModelGroupsToWatchdog ran at boot, so `pinned: true` and `concurrency_groups: [...]` only became effective after a settings-driven watchdog restart. Fix by adding both sync calls to initializeWatchdog. Confirmed end-to-end: loading model A in group "heavy", then C with no group (coexists), then B in group "heavy" now correctly evicts A and leaves [B, C]. Assisted-by: Claude:claude-opus-4-7 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(test): satisfy errcheck on new os.Remove in concurrency_groups spec CI lint runs new-from-merge-base, so the existing pre-existing `defer os.Remove(tmp.Name())` lines are baseline-grandfathered but the one introduced by the concurrency_groups YAML round-trip test is held to errcheck. Wrap the remove in a closure that discards the error. Assisted-by: Claude:claude-opus-4-7 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>		2026-05-05 08:42:50 +02:00
..
advisorylock	feat(distributed): sync state with frontends, better backend management reporting (#9426 )	2026-04-19 17:55:53 +02:00
agentpool	feat: add distributed mode (#9124 )	2026-03-30 00:47:27 +02:00
agents	feat: add distributed mode (#9124 )	2026-03-30 00:47:27 +02:00
dbutil	feat: add distributed mode (#9124 )	2026-03-30 00:47:27 +02:00
distributed	feat: add distributed mode (#9124 )	2026-03-30 00:47:27 +02:00
facerecognition	feat(face-recognition): add insightface/onnx backend for 1:1 verify, 1:N identify, embedding, detection, analysis (#9480 )	2026-04-22 21:55:41 +02:00
finetune	feat: add distributed mode (#9124 )	2026-03-30 00:47:27 +02:00
galleryop	feat(distributed): per-node backend installation from the gallery	2026-04-26 22:05:18 +00:00
jobs	feat: add distributed mode (#9124 )	2026-03-30 00:47:27 +02:00
mcp	feat: add distributed mode (#9124 )	2026-03-30 00:47:27 +02:00
messaging	feat(distributed): support multiple replicas of one model on the same node (#9583 )	2026-04-27 21:20:05 +02:00
modeladmin	feat: localai assistant chat modality (#9602 )	2026-04-28 19:29:27 +02:00
monitoring	feat: add distributed mode (#9124 )	2026-03-30 00:47:27 +02:00
nodes	feat(concurrency-groups): per-model exclusive groups for backend loading (#9662 )	2026-05-05 08:42:50 +02:00
quantization	feat: add distributed mode (#9124 )	2026-03-30 00:47:27 +02:00
skills	feat: add distributed mode (#9124 )	2026-03-30 00:47:27 +02:00
storage	feat: track files being staged (#9275 )	2026-04-08 14:33:58 +02:00
testutil	feat: add distributed mode (#9124 )	2026-03-30 00:47:27 +02:00
voicerecognition	feat: voice recognition (#9500 )	2026-04-23 12:07:14 +02:00