LocalAI/pkg/model
Ettore Di Giacinto bbcaebc1ef
feat(concurrency-groups): per-model exclusive groups for backend loading (#9662)
* feat(concurrency-groups): per-model exclusive groups for backend loading

Adds `concurrency_groups: [...]` to model YAML configs. Two models that share
a group cannot be loaded concurrently on the same node — loading one evicts
the others, reusing the existing pinned/busy/retry policy from LRU eviction.

Layered design:
- Watchdog (pkg/model): per-node correctness floor — on every Load(), evict
  any loaded model that shares a group with the requested one. Pinned skips
  surface NeedMore so the loader retries (and ultimately logs a clear
  warning), instead of silently allowing the rule to be violated.
- Distributed scheduler (core/services/nodes): soft anti-affinity hint —
  scheduleNewModel prefers nodes that don't already host a same-group
  model, falling back to eviction only if every candidate has a conflict.
  Composes with NodeSelector at the same point in the candidate pipeline.

Per-node, not cluster-wide: VRAM is a node-local resource, and two heavy
models running on different nodes is fine. The ConfigLoader is wired into
SmartRouter via a small ConcurrencyConflictResolver interface so the nodes
package keeps a narrow surface on core/config.

Refactors the inner LRU eviction body into a shared collectEvictionsLocked
helper and the loader retry loop into retryEnforce(fn, maxRetries, interval),
so both LRU and group enforcement share busy/pinned/retry semantics.

Closes #9659.

Assisted-by: Claude:claude-opus-4-7 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(watchdog): sync pinned + concurrency_groups at startup

The startup-time watchdog setup lives in initializeWatchdog (startup.go),
not in startWatchdog (watchdog.go). The latter is only invoked from the
runtime-settings RestartWatchdog path. As a result, neither
SyncPinnedModelsToWatchdog nor SyncModelGroupsToWatchdog ran at boot,
so `pinned: true` and `concurrency_groups: [...]` only became effective
after a settings-driven watchdog restart.

Fix by adding both sync calls to initializeWatchdog. Confirmed end-to-end:
loading model A in group "heavy", then C with no group (coexists),
then B in group "heavy" now correctly evicts A and leaves [B, C].

Assisted-by: Claude:claude-opus-4-7 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(test): satisfy errcheck on new os.Remove in concurrency_groups spec

CI lint runs new-from-merge-base, so the existing pre-existing
`defer os.Remove(tmp.Name())` lines are baseline-grandfathered but the
one introduced by the concurrency_groups YAML round-trip test is held
to errcheck. Wrap the remove in a closure that discards the error.

Assisted-by: Claude:claude-opus-4-7 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-05-05 08:42:50 +02:00
..
backend_log_store.go feat: react chat redesign (#9616) 2026-04-29 22:33:26 +02:00
backend_log_store_test.go feat: react chat redesign (#9616) 2026-04-29 22:33:26 +02:00
connection_errors.go fix(nodes): better detection if nodes goes down or model is not available (#9274) 2026-04-08 12:11:02 +02:00
connection_evicting_client.go feat: wire transcription for llama.cpp, add streaming support (#9353) 2026-04-14 16:13:40 +02:00
filters.go fix: make sure to close on errors (#7521) 2025-12-11 14:03:20 +01:00
initializers.go feat(concurrency-groups): per-model exclusive groups for backend loading (#9662) 2026-05-05 08:42:50 +02:00
initializers_retry_test.go feat(concurrency-groups): per-model exclusive groups for backend loading (#9662) 2026-05-05 08:42:50 +02:00
loader.go fix(model-loader): also skip .ckpt, .zip, and .tag files when scanning models 2026-04-26 19:37:53 +00:00
loader_options.go feat: refactor build process, drop embedded backends (#5875) 2025-07-22 16:31:04 +02:00
loader_test.go fix(nodes): better detection if nodes goes down or model is not available (#9274) 2026-04-08 12:11:02 +02:00
model.go feat: add distributed mode (#9124) 2026-03-30 00:47:27 +02:00
model_suite_test.go tests: add template tests (#2063) 2024-04-18 10:57:24 +02:00
process.go feat: Log backend exit code (#9581) 2026-04-27 14:19:18 +02:00
store.go feat: add distributed mode (#9124) 2026-03-30 00:47:27 +02:00
store_test.go feat: add distributed mode (#9124) 2026-03-30 00:47:27 +02:00
watchdog.go feat(concurrency-groups): per-model exclusive groups for backend loading (#9662) 2026-05-05 08:42:50 +02:00
watchdog_options.go feat: disable force eviction (#7725) 2025-12-25 14:26:18 +01:00
watchdog_options_test.go chore: fixup tests with defaults from constants 2025-12-16 21:26:55 +00:00
watchdog_test.go feat(concurrency-groups): per-model exclusive groups for backend loading (#9662) 2026-05-05 08:42:50 +02:00