LocalAI

mirror of https://github.com/mudler/LocalAI synced 2026-05-24 09:28:23 +00:00

History

Ettore Di Giacinto 24505e57f5 feat(backends): add CUDA 13 + L4T arm64 CUDA 13 variants for vllm/vllm-omni/sglang (#9553 ) * feat(backends): add CUDA 13 + L4T arm64 CUDA 13 variants for vllm/vllm-omni/sglang Adds new build profiles mirroring the diffusers/ace-step pattern so vLLM serving (and SGLang on arm64) can be deployed on CUDA 13 hosts and JetPack 7 boards: - vllm: cublas13 (PyPI cu130 channel) + l4t13 (jetson-ai-lab SBSA cu130 prebuilt vllm + flash-attn). - vllm-omni: cublas13 + l4t13. Floats vllm version on cu13 since vllm 0.19+ ships cu130 wheels by default and vllm-omni tracks vllm master; cu12 path keeps the 0.14.0 pin to avoid disturbing existing images. - sglang: l4t13 arm64 only — uses the prebuilt sglang wheel from the jetson-ai-lab SBSA cu130 index, so no source build is needed. Cublas13 sglang on x86_64 is intentionally deferred. CI matrix gains five new images (-gpu-nvidia-cuda-13-vllm{,-omni}, -nvidia-l4t-cuda-13-arm64-{vllm,vllm-omni,sglang}); backend/index.yaml gains the matching capability keys (nvidia-cuda-13, nvidia-l4t-cuda-13) and latest/development merge entries. Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Write] [Bash] * fix(backends): use unsafe-best-match index strategy on l4t13 builds The jetson-ai-lab SBSA cu130 index lists transitive deps (decord, etc.) at limited versions / older Python ABIs. uv defaults to the first index that contains a package and refuses to fall through to PyPI, so sglang l4t13 build fails resolving decord. Mirror the existing cpu sglang profile by setting --index-strategy=unsafe-best-match on l4t13 across the three backends, and apply it to the explicit vllm install line in vllm-omni's install.sh (which doesn't honor EXTRA_PIP_INSTALL_FLAGS). Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Bash] * fix(sglang): drop [all] extras on l4t13, floor version at 0.5.0 The [all] extra brings in outlines→decord, and decord has no aarch64 cp312 wheel on PyPI nor the jetson-ai-lab index (only legacy cp35-cp37 tags). With unsafe-best-match enabled, uv backtracked through sglang versions trying to satisfy decord and silently landed on sglang==0.1.16, an ancient version with an entirely different dep tree (cloudpickle/outlines 0.0.44, etc.). Drop [all] so decord is no longer required, and floor sglang at 0.5.0 to prevent any future resolver misfire from degrading the version again. Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Bash] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>		2026-04-25 12:26:29 +02:00
..
disabled	chore(ci): disable CI actions	2026-03-02 14:48:00 +01:00
backend.yml	feat(backends): add CUDA 13 + L4T arm64 CUDA 13 variants for vllm/vllm-omni/sglang (#9553 )	2026-04-25 12:26:29 +02:00
backend_build.yml	feat: voice recognition (#9500 )	2026-04-23 12:07:14 +02:00
backend_build_darwin.yml	chore(deps): bump docker/metadata-action from 5 to 6 (#8917 )	2026-03-09 22:27:02 +01:00
backend_pr.yml	Change runner from macOS-14 to macos-latest	2025-12-13 10:11:27 +01:00
build-test.yaml	chore(deps): bump actions/upload-artifact from 6 to 7 (#8730 )	2026-03-02 21:43:39 +01:00
bump-inference-defaults.yml	chore(deps): bump peter-evans/create-pull-request from 7 to 8 (#9114 )	2026-03-24 08:50:50 +01:00
bump_deps.yaml	feat(backend): add turboquant llama.cpp-fork backend (#9355 )	2026-04-15 01:25:04 +02:00
bump_docs.yaml	fix(api)!: Stop model prior to deletion (#8422 )	2026-02-06 09:22:10 +01:00
checksum_checker.yaml	fix(api)!: Stop model prior to deletion (#8422 )	2026-02-06 09:22:10 +01:00
deploy-explorer.yaml	fix(api)!: Stop model prior to deletion (#8422 )	2026-02-06 09:22:10 +01:00
gallery-agent.yaml	fix(gallery-agent): process blacklist command on recently-closed PRs (#9473 )	2026-04-21 16:29:13 +02:00
generate_grpc_cache.yaml	chore(deps): bump docker/build-push-action from 6 to 7 (#8919 )	2026-03-09 22:29:51 +01:00
generate_intel_image.yaml	[intel GPU support] Use latest oneapi-basekit image for Intel images to support b70 (#9543 )	2026-04-24 18:29:10 +02:00
gh-pages.yml	chore(deps): bump actions/upload-pages-artifact from 4 to 5 (#9337 )	2026-04-13 21:53:47 +02:00
image-pr.yml	feat(rocm): bump to 7.x (#9323 )	2026-04-12 08:51:30 +02:00
image.yml	feat(rocm): bump to 7.x (#9323 )	2026-04-12 08:51:30 +02:00
image_build.yml	chore: drop AIO images (#9004 )	2026-03-14 17:49:36 +01:00
notify-releases.yaml	fix(api)!: Stop model prior to deletion (#8422 )	2026-02-06 09:22:10 +01:00
release.yaml	chore(deps): bump softprops/action-gh-release from 2 to 3 (#9336 )	2026-04-13 21:53:28 +02:00
secscan.yaml	Revert "chore(deps): bump securego/gosec from 2.22.9 to 2.22.11" (#7789 )	2025-12-30 09:58:13 +01:00
stalebot.yml	chore(deps): bump actions/stale from 10.1.1 to 10.2.0 (#8633 )	2026-02-23 23:27:20 +01:00
test-extra.yml	feat: Add Sherpa ONNX backend for ASR and TTS (#8523 )	2026-04-24 14:40:06 +02:00
test.yml	feat: Add Sherpa ONNX backend for ASR and TTS (#8523 )	2026-04-24 14:40:06 +02:00
tests-e2e.yml	feat(realtime): WebRTC support (#8790 )	2026-03-13 21:37:15 +01:00
tests-ui-e2e.yml	chore(deps): bump actions/upload-artifact from 4 to 7 (#9030 )	2026-03-17 11:42:49 +01:00
update_swagger.yaml	fix(api)!: Stop model prior to deletion (#8422 )	2026-02-06 09:22:10 +01:00
yaml-check.yml	chore(backend gallery): add description for remaining backends (#5679 )	2025-06-17 22:21:44 +02:00