LocalAI/core/http
Ettore Di Giacinto b4e30692a2
feat(backends): add sglang (#9359)
* feat(backends): add sglang

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(sglang): force AVX-512 CXXFLAGS and disable CI e2e job

sgl-kernel's shm.cpp uses __m512 AVX-512 intrinsics unconditionally;
-march=native fails on CI runners without AVX-512 in /proc/cpuinfo.
Force -march=sapphirerapids so the build always succeeds, matching
sglang upstream's docker/xeon.Dockerfile recipe.

The resulting binary still requires an AVX-512 capable CPU at runtime,
so disable tests-sglang-grpc in test-extra.yml for the same reason
tests-vllm-grpc is disabled. Local runs with make test-extra-backend-sglang
still work on hosts with the right SIMD baseline.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(sglang): patch CMakeLists.txt instead of CXXFLAGS for AVX-512

CXXFLAGS with -march=sapphirerapids was being overridden by
add_compile_options(-march=native) in sglang's CPU CMakeLists.txt,
since CMake appends those flags after CXXFLAGS. Sed-patch the
CMakeLists.txt directly after cloning to replace -march=native.

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-16 22:40:56 +02:00
..
auth fix(oauth/invite): do not register user (prending approval) without correct invite (#9189) 2026-03-31 08:29:07 +02:00
endpoints fix(agents): handle embedding model dim changes on collection upload (#9365) 2026-04-15 20:05:28 +02:00
middleware feat: Add toggle mechanism to enable/disable models from loading on demand (#9304) 2026-04-10 18:17:41 +02:00
react-ui feat(backends): add sglang (#9359) 2026-04-16 22:40:56 +02:00
routes fix(ui): show also concrete backends in the backend list 2026-04-16 17:44:25 +00:00
static feat(realtime): WebRTC support (#8790) 2026-03-13 21:37:15 +01:00
views feat(realtime): WebRTC support (#8790) 2026-03-13 21:37:15 +01:00
app.go feat(api): add ollama compatibility (#9284) 2026-04-09 14:15:14 +02:00
app_test.go fix(streaming): skip chat deltas for role-init elements to prevent first token duplication (#9299) 2026-04-10 08:45:47 +02:00
explorer.go chore(refactor): move logging to common package based on slog (#7668) 2025-12-21 19:33:13 +01:00
http_suite_test.go feat(api): add support for open responses specification (#8063) 2026-01-17 22:11:47 +01:00
openresponses_test.go feat: add distributed mode (#9124) 2026-03-30 00:47:27 +02:00
render.go feat: add distributed mode (#9124) 2026-03-30 00:47:27 +02:00