LocalAI/backend/cpp
Ettore Di Giacinto 21eace40ec
feat(llama-cpp): expose split_mode option for multi-GPU placement (#9560)
Adds split_mode (alias sm) to the llama.cpp backend options allowlist,
accepting none|layer|row|tensor. The tensor value targets the experimental
backend-agnostic tensor parallelism from ggml-org/llama.cpp#19378 and
requires a llama.cpp build that includes that PR, FlashAttention enabled,
KV-cache quantization disabled, and a manually set context size.


Assisted-by: Claude:claude-opus-4-7

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-25 14:02:57 +02:00
..
grpc fix: speedup git submodule update with --single-branch (#2847) 2024-07-13 22:32:25 +02:00
ik-llama-cpp chore: ⬆️ Update ikawrakow/ik_llama.cpp to cb58a561f0c49f68b6d125cdfda037ed80433821 (#9549) 2026-04-25 08:59:48 +02:00
llama-cpp feat(llama-cpp): expose split_mode option for multi-GPU placement (#9560) 2026-04-25 14:02:57 +02:00
turboquant fix(turboquant): drop ignore-eos patch, bump fork to b8967-627ebbc (#9423) 2026-04-19 21:05:21 +02:00