LocalAI

mirror of https://github.com/mudler/LocalAI synced 2026-05-24 09:28:23 +00:00

History

LocalAI [bot] 22ff86d64f fix(distributed): round-robin replicas of the same model (#9695 ) FindAndLockNodeWithModel previously ordered candidate replicas by in_flight ASC, available_vram DESC. The primary key is correct, but the tiebreaker meant that whenever in_flight tied — the common case at low to moderate concurrency where requests don't overlap — the node with the largest available VRAM won every pick. With autoscaling placing replicas of the same model on multiple nodes, the fattest GPU node ended up taking nearly all the load while the others sat idle. Insert last_used ASC between the two existing tiers. last_used is already refreshed inside the same transaction that increments in_flight (and by TouchNodeModel on cache hits in the router), so the "oldest-used" replica naturally rotates through the candidate set — strict round-robin without a schema change. available_vram DESC is demoted to a final tiebreaker for cold starts where last_used is identical across replicas. Placement queries (FindNodeWithVRAM, FindLeastLoadedNode, and the *FromSet variants) have the same fattest-GPU bias on tiebreakers but are higher-cost to fix consistently. Deferred to a follow-up so the routing fix can land first — for the user-observed symptom routing was the dominant cause anyway. Test: registry_test.go adds a focused spec that loads three replicas on three nodes with 24/16/8 GB VRAM and asserts each is picked at least twice across 9 in_flight-tied calls. Assisted-by: claude-code:claude-opus-4-7 [Read] [Edit] [Bash] [Grep] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>		2026-05-06 19:40:54 +02:00
..
advisorylock	feat(distributed): sync state with frontends, better backend management reporting (#9426 )	2026-04-19 17:55:53 +02:00
agentpool	feat: add distributed mode (#9124 )	2026-03-30 00:47:27 +02:00
agents	feat: add distributed mode (#9124 )	2026-03-30 00:47:27 +02:00
dbutil	feat: add distributed mode (#9124 )	2026-03-30 00:47:27 +02:00
distributed	feat: add distributed mode (#9124 )	2026-03-30 00:47:27 +02:00
facerecognition	feat(face-recognition): add insightface/onnx backend for 1:1 verify, 1:N identify, embedding, detection, analysis (#9480 )	2026-04-22 21:55:41 +02:00
finetune	feat: add distributed mode (#9124 )	2026-03-30 00:47:27 +02:00
galleryop	feat(distributed): per-node backend installation from the gallery	2026-04-26 22:05:18 +00:00
jobs	feat: add distributed mode (#9124 )	2026-03-30 00:47:27 +02:00
mcp	feat: add distributed mode (#9124 )	2026-03-30 00:47:27 +02:00
messaging	feat(distributed): support multiple replicas of one model on the same node (#9583 )	2026-04-27 21:20:05 +02:00
modeladmin	feat(gallery): Speed up load times and clean gallery entries (#9211 )	2026-05-06 14:51:38 +02:00
monitoring	feat: add distributed mode (#9124 )	2026-03-30 00:47:27 +02:00
nodes	fix(distributed): round-robin replicas of the same model (#9695 )	2026-05-06 19:40:54 +02:00
quantization	feat: add distributed mode (#9124 )	2026-03-30 00:47:27 +02:00
skills	feat: add distributed mode (#9124 )	2026-03-30 00:47:27 +02:00
storage	feat: track files being staged (#9275 )	2026-04-08 14:33:58 +02:00
testutil	feat: add distributed mode (#9124 )	2026-03-30 00:47:27 +02:00
voicerecognition	feat: voice recognition (#9500 )	2026-04-23 12:07:14 +02:00