mirror of
https://github.com/mudler/LocalAI
synced 2026-05-24 09:28:23 +00:00
Adds split_mode (alias sm) to the llama.cpp backend options allowlist, accepting none|layer|row|tensor. The tensor value targets the experimental backend-agnostic tensor parallelism from ggml-org/llama.cpp#19378 and requires a llama.cpp build that includes that PR, FlashAttention enabled, KV-cache quantization disabled, and a manually set context size. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> |
||
|---|---|---|
| .. | ||
| advanced | ||
| features | ||
| getting-started | ||
| installation | ||
| reference | ||
| _index.md | ||
| faq.md | ||
| integrations.md | ||
| overview.md | ||
| whats-new.md | ||