LocalAI/core
Ettore Di Giacinto b4e30692a2
feat(backends): add sglang (#9359)
* feat(backends): add sglang

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(sglang): force AVX-512 CXXFLAGS and disable CI e2e job

sgl-kernel's shm.cpp uses __m512 AVX-512 intrinsics unconditionally;
-march=native fails on CI runners without AVX-512 in /proc/cpuinfo.
Force -march=sapphirerapids so the build always succeeds, matching
sglang upstream's docker/xeon.Dockerfile recipe.

The resulting binary still requires an AVX-512 capable CPU at runtime,
so disable tests-sglang-grpc in test-extra.yml for the same reason
tests-vllm-grpc is disabled. Local runs with make test-extra-backend-sglang
still work on hosts with the right SIMD baseline.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(sglang): patch CMakeLists.txt instead of CXXFLAGS for AVX-512

CXXFLAGS with -march=sapphirerapids was being overridden by
add_compile_options(-march=native) in sglang's CPU CMakeLists.txt,
since CMake appends those flags after CXXFLAGS. Sed-patch the
CMakeLists.txt directly after cloning to replace -march=native.

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-16 22:40:56 +02:00
..
application feat(ux): backend management enhancement (#9325) 2026-04-12 00:35:22 +02:00
backend feat: wire transcription for llama.cpp, add streaming support (#9353) 2026-04-14 16:13:40 +02:00
cli feat(ux): backend management enhancement (#9325) 2026-04-12 00:35:22 +02:00
clients feat: add distributed mode (#9124) 2026-03-30 00:47:27 +02:00
config feat(vllm): parity with llama.cpp backend (#9328) 2026-04-13 11:00:29 +02:00
dependencies_manager feat(ui): move to React for frontend (#8772) 2026-03-05 21:47:12 +01:00
explorer feat: add distributed mode (#9124) 2026-03-30 00:47:27 +02:00
gallery feat: refactor shared helpers and enhance MLX backend functionality (#9335) 2026-04-13 18:44:03 +02:00
http feat(backends): add sglang (#9359) 2026-04-16 22:40:56 +02:00
p2p feat: add distributed mode (#9124) 2026-03-30 00:47:27 +02:00
schema feat: wire transcription for llama.cpp, add streaming support (#9353) 2026-04-14 16:13:40 +02:00
services feat: wire transcription for llama.cpp, add streaming support (#9353) 2026-04-14 16:13:40 +02:00
startup feat: add distributed mode (#9124) 2026-03-30 00:47:27 +02:00
templates feat: add distributed mode (#9124) 2026-03-30 00:47:27 +02:00
trace feat: add distributed mode (#9124) 2026-03-30 00:47:27 +02:00