LocalAI/docs/content
Ettore Di Giacinto 21eace40ec
feat(llama-cpp): expose split_mode option for multi-GPU placement (#9560)
Adds split_mode (alias sm) to the llama.cpp backend options allowlist,
accepting none|layer|row|tensor. The tensor value targets the experimental
backend-agnostic tensor parallelism from ggml-org/llama.cpp#19378 and
requires a llama.cpp build that includes that PR, FlashAttention enabled,
KV-cache quantization disabled, and a manually set context size.


Assisted-by: Claude:claude-opus-4-7

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-25 14:02:57 +02:00
..
advanced feat(backend): add turboquant llama.cpp-fork backend (#9355) 2026-04-15 01:25:04 +02:00
features feat(llama-cpp): expose split_mode option for multi-GPU placement (#9560) 2026-04-25 14:02:57 +02:00
getting-started feat(rocm): bump to 7.x (#9323) 2026-04-12 08:51:30 +02:00
installation feat(ui): Interactive model config editor with autocomplete (#9149) 2026-04-07 14:42:23 +02:00
reference fix(distributed): correct VRAM/RAM reporting on NVIDIA unified-memory hosts (#9545) 2026-04-24 22:02:23 +02:00
_index.md chore(docs): center video 2025-12-08 16:59:11 +01:00
faq.md feat: docs revamp (#7313) 2025-11-19 22:21:20 +01:00
integrations.md fix(anthropic): do not emit empty tokens and fix SSE tool calls (#9258) 2026-04-07 00:38:21 +02:00
overview.md feat: add distributed mode (#9124) 2026-03-30 00:47:27 +02:00
whats-new.md feat(face-recognition): add insightface/onnx backend for 1:1 verify, 1:N identify, embedding, detection, analysis (#9480) 2026-04-22 21:55:41 +02:00