mirror of
https://github.com/mudler/LocalAI
synced 2026-05-24 09:28:23 +00:00
Adds split_mode (alias sm) to the llama.cpp backend options allowlist, accepting none|layer|row|tensor. The tensor value targets the experimental backend-agnostic tensor parallelism from ggml-org/llama.cpp#19378 and requires a llama.cpp build that includes that PR, FlashAttention enabled, KV-cache quantization disabled, and a manually set context size. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> |
||
|---|---|---|
| .. | ||
| CMakeLists.txt | ||
| grpc-server.cpp | ||
| Makefile | ||
| package.sh | ||
| prepare.sh | ||
| run.sh | ||