LocalAI/backend/python/sglang/pyproject.toml

# L4T arm64 (JetPack 7 / sbsa cu130) install spec for the sglang backend.
#
# Why this file exists, and why only the l4t13 BUILD_PROFILE consumes it:
#
# pypi.jetson-ai-lab.io hosts the L4T-specific torch / sglang / sgl-kernel
# wheels we need on aarch64 + cuda13, but it ALSO transparently proxies the
# rest of PyPI through `/+f/<sha>/<filename>` URLs that 503 frequently.
# With `--extra-index-url` + `--index-strategy=unsafe-best-match` (the
# historical fix in install.sh) uv would pick those proxy URLs for ordinary
# PyPI packages — markdown-it-py, anthropic, propcache, etc. — and trip on
# the 503s. See e.g. CI run 25439791228 (markdown-it-py-4.0.0).
#
# `explicit = true` on the index makes uv consult the L4T mirror ONLY for
# packages mapped under [tool.uv.sources]. Everything else goes to PyPI.
# This breaks the historical 503 path without losing access to the L4T
# wheels we actually need from there. Mirrors the equivalent fix already
# in backend/python/vllm/pyproject.toml.
#
# `uv pip install -r requirements.txt` does NOT honor [tool.uv.sources]
# (sources are project-mode only, not pip-compat mode), so install.sh's
# l4t13 branch invokes `uv pip install --requirement pyproject.toml`
# directly. Other BUILD_PROFILEs continue to use the requirements-*.txt
# pipeline through libbackend.sh's installRequirements and never read
# this file.
[project]
name = "localai-sglang-l4t13"
version = "0.0.0"
requires-python = ">=3.12,<3.13"
dependencies = [
    # Mirror of requirements.txt — kept in sync manually for now since the
    # l4t13 path bypasses installRequirements (see install.sh).
    "grpcio==1.80.0",
    "protobuf",
    "certifi",
    "setuptools",
    "pillow",
    # L4T-specific accelerator stack (sourced from jetson-ai-lab below).
    "torch",
    "torchvision",
    "torchaudio",
    # sglang on jetson — the [all] extra is deliberately omitted because it
    # pulls outlines/decord, and decord has no aarch64 cp312 wheel anywhere
    # (PyPI nor the jetson-ai-lab index ships only legacy cp35-cp37). With
    # [all] uv backtracks through versions trying to satisfy decord and
    # lands on sglang==0.1.16. The 0.5.0 floor matches the only major
    # series the jetson-ai-lab sbsa/cu130 mirror currently publishes
    # (sglang==0.5.1.post2 as of 2026-05-06). Bumping to >=0.5.11 here
    # would make the build unsatisfiable until the mirror catches up.
    # Gemma 4 / MTP recipes are therefore not supported on l4t13 — those
    # features land on cublas12/cublas13 hosts that pull the newer wheel
    # from PyPI. backend.py keeps backward compat with the 0.5.x SamplingParams
    # field rename via runtime detection.
    "sglang>=0.5.0",
    # PyPI-resolvable packages that complete the runtime.
    "accelerate",
    "transformers",
]

[[tool.uv.index]]
name = "jetson-ai-lab"
url = "https://pypi.jetson-ai-lab.io/sbsa/cu130"
explicit = true

[tool.uv.sources]
torch = { index = "jetson-ai-lab" }
torchvision = { index = "jetson-ai-lab" }
torchaudio = { index = "jetson-ai-lab" }
sglang = { index = "jetson-ai-lab" }
feat(sglang): wire engine_args, add cuda13 build, ship MTP gallery demos (#9686) Bring the sglang Python backend up to feature parity with vllm by adding the same engine_args:-map plumbing the vLLM backend already has. Any ServerArgs field (~380 in sglang 0.5.11) becomes settable from a model YAML, including the speculative-decoding flags needed for Multi-Token Prediction. Validation matches the vllm backend's: keys are checked against dataclasses.fields(ServerArgs), unknown keys raise ValueError with a difflib close-match suggestion at LoadModel time, and the typed ModelOptions fields keep their existing meaning with engine_args overriding them. Backend code: * backend/python/sglang/backend.py: add _apply_engine_args, import dataclasses/difflib/ServerArgs, call from LoadModel; rename Seed -> sampling_seed (sglang 0.5.11 renamed the SamplingParams field). * backend/python/sglang/test.py + test.sh + Makefile: six unit tests exercising the helper directly (no engine load required). Build / CI / backend gallery (cuda13 + l4t13 paths are now first-class): * backend/python/sglang/install.sh: add --prerelease=allow because sglang 0.5.11 hard-pins flash-attn-4 which only ships beta wheels; add --index-strategy=unsafe-best-match for cublas12 so the cu128 torch index wins over default-PyPI's cu130; new pyproject.toml-driven l4t13 install path so [tool.uv.sources] can pin torch/torchvision/ torchaudio/sglang to the jetson-ai-lab index without forcing every transitive PyPI dep through the L4T mirror's flaky proxy (mirrors the equivalent fix in backend/python/vllm/install.sh). * backend/python/sglang/pyproject.toml (new): L4T project spec with explicit-source jetson-ai-lab index. Replaces requirements-l4t13.txt for the l4t13 BUILD_PROFILE; other profiles still go through the requirements-.txt pipeline via libbackend.sh's installRequirements. backend/python/sglang/requirements-l4t13.txt: removed; superseded by pyproject.toml. * backend/python/sglang/requirements-cublas{12,13}{,-after}.txt: pin sglang>=0.5.11 (Gemma 4 floor); add cu130 torch index for cublas13 (new files) and cu128 torch index for cublas12 (default PyPI now ships cu130 torch wheels by default and breaks cu12 hosts). * backend/index.yaml: add cuda13-sglang and cuda13-sglang-development capability mappings + image entries pointing at quay.io/.../-gpu-nvidia-cuda-13-sglang. * .github/workflows/backend.yml: new cublas13 sglang matrix entry, mirroring vllm's cuda13 build. Model gallery + docs: * gallery/sglang.yaml: base sglang config template, mirrors vllm.yaml. * gallery/sglang-gemma-4-{e2b,e4b}-mtp.yaml: Gemma 4 MTP demos transcribed verbatim from the SGLang Gemma 4 cookbook MTP commands. * gallery/sglang-mimo-7b-mtp.yaml: MiMo-7B-RL with built-in MTP heads + online fp8 weight quantization, verified end-to-end on a 16 GB RTX 5070 Ti at ~88 tok/s. Uses mem_fraction_static: 0.7 because the MTP draft worker's vocab embedding is loaded unquantised and OOMs the static reservation at sglang's 0.85 default. * gallery/index.yaml: three new entries (gemma-4-e2b-it:sglang-mtp, gemma-4-e4b-it:sglang-mtp, mimo-7b-mtp:sglang). * docs/content/features/text-generation.md: new SGLang section with setup, engine_args reference, MTP demos, version requirements. * .agents/sglang-backend.md (new): agent one-pager covering the flat ServerArgs structure, the typed-vs-engine_args precedence, the speculative-decoding cheatsheet, and the mem_fraction_static gotcha documented above. * AGENTS.md: index entry for the new agent doc. Known limitation: the two Gemma 4 MTP gallery entries ship a recipe that doesn't yet run on stock libraries. The drafter checkpoints (google/gemma-4-{E2B,E4B}-it-assistant) declare model_type: gemma4_assistant / Gemma4AssistantForCausalLM, which neither transformers (<=5.6.0, including the SGLang cookbook's pinned commit 91b1ab1f... and main HEAD) nor sglang's own model registry (<=0.5.11) registers as of 2026-05-06. They will start working when HF or sglang upstream registers the architecture -- no LocalAI changes needed. The MiMo MTP demo and the non-MTP Gemma 4 paths work today on this build (verified on RTX 5070 Ti, 16 GB). Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Bash] [WebFetch] [WebSearch] Signed-off-by: Richard Palethorpe <io@richiejp.com> 2026-05-07 15:27:29 +00:00			`# L4T arm64 (JetPack 7 / sbsa cu130) install spec for the sglang backend.`
			`#`
			`# Why this file exists, and why only the l4t13 BUILD_PROFILE consumes it:`
			`#`
			`# pypi.jetson-ai-lab.io hosts the L4T-specific torch / sglang / sgl-kernel`
			`# wheels we need on aarch64 + cuda13, but it ALSO transparently proxies the`
			# rest of PyPI through `/+f/<sha>/<filename>` URLs that 503 frequently.
			# With `--extra-index-url` + `--index-strategy=unsafe-best-match` (the
			`# historical fix in install.sh) uv would pick those proxy URLs for ordinary`
			`# PyPI packages — markdown-it-py, anthropic, propcache, etc. — and trip on`
			`# the 503s. See e.g. CI run 25439791228 (markdown-it-py-4.0.0).`
			`#`
			# `explicit = true` on the index makes uv consult the L4T mirror ONLY for
			`# packages mapped under [tool.uv.sources]. Everything else goes to PyPI.`
			`# This breaks the historical 503 path without losing access to the L4T`
			`# wheels we actually need from there. Mirrors the equivalent fix already`
			`# in backend/python/vllm/pyproject.toml.`
			`#`
			# `uv pip install -r requirements.txt` does NOT honor [tool.uv.sources]
			`# (sources are project-mode only, not pip-compat mode), so install.sh's`
			# l4t13 branch invokes `uv pip install --requirement pyproject.toml`
			`# directly. Other BUILD_PROFILEs continue to use the requirements-*.txt`
			`# pipeline through libbackend.sh's installRequirements and never read`
			`# this file.`
			`[project]`
			`name = "localai-sglang-l4t13"`
			`version = "0.0.0"`
			`requires-python = ">=3.12,<3.13"`
			`dependencies = [`
			`# Mirror of requirements.txt — kept in sync manually for now since the`
			`# l4t13 path bypasses installRequirements (see install.sh).`
			`"grpcio==1.80.0",`
			`"protobuf",`
			`"certifi",`
			`"setuptools",`
			`"pillow",`
			`# L4T-specific accelerator stack (sourced from jetson-ai-lab below).`
			`"torch",`
			`"torchvision",`
			`"torchaudio",`
			`# sglang on jetson — the [all] extra is deliberately omitted because it`
			`# pulls outlines/decord, and decord has no aarch64 cp312 wheel anywhere`
			`# (PyPI nor the jetson-ai-lab index ships only legacy cp35-cp37). With`
			`# [all] uv backtracks through versions trying to satisfy decord and`
			`# lands on sglang==0.1.16. The 0.5.0 floor matches the only major`
			`# series the jetson-ai-lab sbsa/cu130 mirror currently publishes`
			`# (sglang==0.5.1.post2 as of 2026-05-06). Bumping to >=0.5.11 here`
			`# would make the build unsatisfiable until the mirror catches up.`
			`# Gemma 4 / MTP recipes are therefore not supported on l4t13 — those`
			`# features land on cublas12/cublas13 hosts that pull the newer wheel`
			`# from PyPI. backend.py keeps backward compat with the 0.5.x SamplingParams`
			`# field rename via runtime detection.`
			`"sglang>=0.5.0",`
			`# PyPI-resolvable packages that complete the runtime.`
			`"accelerate",`
			`"transformers",`
			`]`

			`[[tool.uv.index]]`
			`name = "jetson-ai-lab"`
			`url = "https://pypi.jetson-ai-lab.io/sbsa/cu130"`
			`explicit = true`

			`[tool.uv.sources]`
			`torch = { index = "jetson-ai-lab" }`
			`torchvision = { index = "jetson-ai-lab" }`
			`torchaudio = { index = "jetson-ai-lab" }`
			`sglang = { index = "jetson-ai-lab" }`