LocalAI

mirror of https://github.com/mudler/LocalAI synced 2026-05-24 09:28:23 +00:00

History

Ettore Di Giacinto 24505e57f5 feat(backends): add CUDA 13 + L4T arm64 CUDA 13 variants for vllm/vllm-omni/sglang (#9553 ) * feat(backends): add CUDA 13 + L4T arm64 CUDA 13 variants for vllm/vllm-omni/sglang Adds new build profiles mirroring the diffusers/ace-step pattern so vLLM serving (and SGLang on arm64) can be deployed on CUDA 13 hosts and JetPack 7 boards: - vllm: cublas13 (PyPI cu130 channel) + l4t13 (jetson-ai-lab SBSA cu130 prebuilt vllm + flash-attn). - vllm-omni: cublas13 + l4t13. Floats vllm version on cu13 since vllm 0.19+ ships cu130 wheels by default and vllm-omni tracks vllm master; cu12 path keeps the 0.14.0 pin to avoid disturbing existing images. - sglang: l4t13 arm64 only — uses the prebuilt sglang wheel from the jetson-ai-lab SBSA cu130 index, so no source build is needed. Cublas13 sglang on x86_64 is intentionally deferred. CI matrix gains five new images (-gpu-nvidia-cuda-13-vllm{,-omni}, -nvidia-l4t-cuda-13-arm64-{vllm,vllm-omni,sglang}); backend/index.yaml gains the matching capability keys (nvidia-cuda-13, nvidia-l4t-cuda-13) and latest/development merge entries. Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Write] [Bash] * fix(backends): use unsafe-best-match index strategy on l4t13 builds The jetson-ai-lab SBSA cu130 index lists transitive deps (decord, etc.) at limited versions / older Python ABIs. uv defaults to the first index that contains a package and refuses to fall through to PyPI, so sglang l4t13 build fails resolving decord. Mirror the existing cpu sglang profile by setting --index-strategy=unsafe-best-match on l4t13 across the three backends, and apply it to the explicit vllm install line in vllm-omni's install.sh (which doesn't honor EXTRA_PIP_INSTALL_FLAGS). Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Bash] * fix(sglang): drop [all] extras on l4t13, floor version at 0.5.0 The [all] extra brings in outlines→decord, and decord has no aarch64 cp312 wheel on PyPI nor the jetson-ai-lab index (only legacy cp35-cp37 tags). With unsafe-best-match enabled, uv backtracked through sglang versions trying to satisfy decord and silently landed on sglang==0.1.16, an ancient version with an entirely different dep tree (cloudpickle/outlines 0.0.44, etc.). Drop [all] so decord is no longer required, and floor sglang at 0.5.0 to prevent any future resolver misfire from degrading the version again. Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Bash] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>		2026-04-25 12:26:29 +02:00
..
ci	fix: roll out bluemonday Sanitize more widely (#3794 )	2024-10-12 09:45:47 +02:00
gallery-agent	fix(ci): switch gallery-agent to sigs.k8s.io/yaml (#9397 )	2026-04-17 10:10:42 +02:00
ISSUE_TEMPLATE	docs/examples: enhancements (#1572 )	2024-01-18 19:41:08 +01:00
workflows	feat(backends): add CUDA 13 + L4T arm64 CUDA 13 variants for vllm/vllm-omni/sglang (#9553 )	2026-04-25 12:26:29 +02:00
bump_deps.sh	feat: do not bundle llama-cpp anymore (#5790 )	2025-07-18 13:24:12 +02:00
bump_docs.sh	fix: github bump_docs.sh regex to drop emoji and other text (#2180 )	2024-04-29 03:55:29 +00:00
check_and_update.py	fix(ci): fixup checksum scanning pipeline (#3631 )	2024-09-23 10:56:10 +02:00
checksum_checker.sh	fix(ci): fixup correct path for check_and_update.py (#2777 )	2024-07-11 23:05:43 +02:00
dependabot.yml	feat: Add backend gallery (#5607 )	2025-06-15 14:56:52 +02:00
FUNDING.yml	Create FUNDING.yml (#725 )	2023-07-09 13:39:00 +02:00
labeler.yml	chore(ci): update labels	2025-02-13 09:58:19 +01:00
PULL_REQUEST_TEMPLATE.md	feat(vllm): Allow to set quantization (#1094 )	2023-09-22 15:52:38 +02:00
release.yml	feat(p2p): Federation and AI swarms (#2723 )	2024-07-08 22:04:06 +02:00
stale.yml	feat: add PR template and stale configuration (#316 )	2023-05-20 09:10:20 +02:00