LocalAI

mirror of https://github.com/mudler/LocalAI synced 2026-05-24 09:28:23 +00:00

History

Ettore Di Giacinto 21eace40ec feat(llama-cpp): expose split_mode option for multi-GPU placement (#9560 ) Adds split_mode (alias sm) to the llama.cpp backend options allowlist, accepting none\|layer\|row\|tensor. The tensor value targets the experimental backend-agnostic tensor parallelism from ggml-org/llama.cpp#19378 and requires a llama.cpp build that includes that PR, FlashAttention enabled, KV-cache quantization disabled, and a manually set context size. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io>		2026-04-25 14:02:57 +02:00
..
grpc	fix: speedup `git submodule update` with `--single-branch` (#2847 )	2024-07-13 22:32:25 +02:00
ik-llama-cpp	chore: ⬆️ Update ikawrakow/ik_llama.cpp to `cb58a561f0c49f68b6d125cdfda037ed80433821` (#9549 )	2026-04-25 08:59:48 +02:00
llama-cpp	feat(llama-cpp): expose split_mode option for multi-GPU placement (#9560 )	2026-04-25 14:02:57 +02:00
turboquant	fix(turboquant): drop ignore-eos patch, bump fork to b8967-627ebbc (#9423 )	2026-04-19 21:05:21 +02:00