LocalAI

mirror of https://github.com/mudler/LocalAI synced 2026-05-24 09:28:23 +00:00

History

Ettore Di Giacinto 21eace40ec feat(llama-cpp): expose split_mode option for multi-GPU placement (#9560 ) Adds split_mode (alias sm) to the llama.cpp backend options allowlist, accepting none\|layer\|row\|tensor. The tensor value targets the experimental backend-agnostic tensor parallelism from ggml-org/llama.cpp#19378 and requires a llama.cpp build that includes that PR, FlashAttention enabled, KV-cache quantization disabled, and a manually set context size. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io>		2026-04-25 14:02:57 +02:00
..
CMakeLists.txt	fix(turboquant): resolve common.h by detecting llama-common vs common target (#9413 )	2026-04-18 20:30:28 +02:00
grpc-server.cpp	feat(llama-cpp): expose split_mode option for multi-GPU placement (#9560 )	2026-04-25 14:02:57 +02:00
Makefile	chore: ⬆️ Update ggml-org/llama.cpp to `361fe72acb7b9bd79059cc177cbeda99b35b5db9` (#9548 )	2026-04-25 08:58:27 +02:00
package.sh	fix(llama.cpp): bundle libdl, librt, libpthread in llama-cpp backend (#9099 )	2026-03-22 00:58:14 +01:00
prepare.sh	chore: ⬆️ Update ggml-org/llama.cpp to `7f8ef50cce40e3e7e4526a3696cb45658190e69a` (#7402 )	2025-12-01 07:50:40 +01:00
run.sh	feat(rocm): bump to 7.x (#9323 )	2026-04-12 08:51:30 +02:00