mirror of
https://github.com/mudler/LocalAI
synced 2026-05-24 09:28:23 +00:00
* refactor(distributed): extract PickBestReplica from FindAndLockNodeWithModel Lifts the replica-selection policy (in_flight ASC, last_used ASC, available_vram DESC) out of the SQL ORDER BY into a pure Go function in the new replicapicker.go. The SQL clause keeps its FOR UPDATE atomicity and remains the production path used by SmartRouter; PickBestReplica is the canonical implementation that the future per-frontend rotating replica cache (TODO referenced from pkg/model) will call against an in-memory snapshot without paying a DB round-trip per inference. A new registry_test mirror spec seeds a multi-tier scenario and asserts both layers pick the same replica, so any future tweak to either side fails the test until the other side is updated. No behavior change. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-7 [Claude Code] * fix(distributed): route per inference request and cache probeHealth Two related fixes that together restore load balancing across loaded replicas of the same model. 1. ModelLoader.Load and LoadModel bypass the local *Model cache when modelRouter is set. The cached *Model wraps an InFlightTrackingClient bound to a single (nodeID, replicaIndex) — reusing it pinned every subsequent request to whichever node won the very first pick, so FindAndLockNodeWithModel's round-robin never got a chance to run even after the reconciler scaled the model out to a second node. In distributed mode SmartRouter.Route now runs per request, and PickBestReplica picks the least-loaded replica each time. SmartRouter has its own coalescing (advisory DB lock for first-time loads + singleflight on backend.install RPC) so concurrent first requests for a not-yet-loaded model still produce a single worker side install. 2. SmartRouter.probeHealth memoizes successful gRPC HealthCheck results in a new probeCache (probe_cache.go) with a 30s TTL. With per-request routing every inference call hits probeHealth, and llama.cpp-style backends serialize HealthCheck behind active Predict — so a burst of incoming requests stalled on the probe to a node already mid-stream, tripping the 2s timeout and falling through to the install path. singleflight collapses N concurrent first-time probes for the same (node, addr) into one round-trip, failed probes invalidate the entry so the staleness-recovery path still triggers, and the TTL matches pkg/model/model.go's healthCheckTTL so the single-process and distributed paths share a staleness budget. The background HealthMonitor still reaps actually-dead backends within ~45s. The bypass introduces one short FindAndLockNodeWithModel transaction per inference. A TODO in pkg/model/loader.go documents the future per modelID rotating-replica cache that would reuse PickBestReplica against an in-memory snapshot and skip the DB round-trip for hot paths. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-7 [Claude Code] --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io> |
||
|---|---|---|
| .. | ||
| distributed_store.go | ||
| distributed_store_test.go | ||
| file_stager.go | ||
| file_stager_http.go | ||
| file_stager_s3.go | ||
| file_staging_client.go | ||
| file_transfer_server.go | ||
| file_transfer_server_test.go | ||
| health.go | ||
| health_mock_test.go | ||
| health_test.go | ||
| inflight.go | ||
| inflight_test.go | ||
| install_progress_publisher.go | ||
| install_progress_publisher_test.go | ||
| interfaces.go | ||
| managers_distributed.go | ||
| managers_distributed_test.go | ||
| model_router.go | ||
| model_router_test.go | ||
| nodes_suite_test.go | ||
| probe_cache.go | ||
| probe_cache_test.go | ||
| reconciler.go | ||
| reconciler_test.go | ||
| registry.go | ||
| registry_test.go | ||
| replicapicker.go | ||
| replicapicker_test.go | ||
| router.go | ||
| router_test.go | ||
| staging_keys.go | ||
| staging_keys_test.go | ||
| staging_progress.go | ||
| unloader.go | ||
| unloader_test.go | ||
| unloader_upgrade_test.go | ||