mirror of https://github.com/mudler/LocalAI synced 2026-05-24 09:28:23 +00:00

History

Richard Palethorpe 16b2d4c807 fix(python-backend): make JIT subprocesses work on hosts of any size (#9679 ) Two related runtime fixes for Python backends that JIT-compile CUDA kernels at first model load (FlashInfer, PyTorch inductor, triton): 1. libbackend.sh: replace `source ${EDIR}/venv/bin/activate` with a minimal manual setup (_activateVenv: export VIRTUAL_ENV, prepend PATH, unset PYTHONHOME) computed from $EDIR at runtime. `uv venv` and `python -m venv` both bake the create-time absolute path into bin/activate (e.g. VIRTUAL_ENV='/vllm/venv' from the Docker build stage), so sourcing activate on a relocated venv — copied out of the build container and unpacked at an arbitrary backend dir — prepends a stale, non-existent path to $PATH. Pip-installed CLI tools (e.g. ninja, used by FlashInfer's NVFP4 GEMM JIT) are then never found and the load aborts with FileNotFoundError. Doing the env setup ourselves matches what `uv run` does internally and sidesteps the relocation problem entirely. Generic — every Python backend benefits. 2. vllm/run.sh: replace ninja's default -j$(nproc)+2 with an adaptive MAX_JOBS = min(nproc, (MemAvailable-4)/4). Each concurrent nvcc/cudafe++ peaks at multiple GiB; the default OOM-kills on memory-tight hosts (e.g. a 16 GiB desktop loading a 27B NVFP4 model) but underutilises 100-core / 1 TB boxes. User-set MAX_JOBS still wins. Also pin NVCC_THREADS=2 unless overridden. Refs: https://github.com/vllm-project/vllm/issues/20079 Assisted-by: Claude:claude-opus-4-7 [Edit] [Bash]		2026-05-06 00:28:01 +02:00
..
backend.py	feat(backends/python): use tempfile.gettempdir() instead of hardcoded /tmp (#9629 )	2026-05-01 10:56:24 +02:00
install.sh	feat(vllm, distributed): tensor parallel distributed workers (#9612 )	2026-05-06 00:22:50 +02:00
Makefile	feat(mlx): add mlx backend (#6049 )	2025-08-22 08:42:29 +02:00
package.sh	feat(vllm, distributed): tensor parallel distributed workers (#9612 )	2026-05-06 00:22:50 +02:00
pyproject.toml	feat(vllm, distributed): tensor parallel distributed workers (#9612 )	2026-05-06 00:22:50 +02:00
README.md	refactor: move backends into the backends directory (#1279 )	2023-11-13 22:40:16 +01:00
requirements-after.txt	feat(vllm): parity with llama.cpp backend (#9328 )	2026-04-13 11:00:29 +02:00
requirements-cpu-after.txt	feat(vllm): parity with llama.cpp backend (#9328 )	2026-04-13 11:00:29 +02:00
requirements-cpu.txt	feat(vllm): parity with llama.cpp backend (#9328 )	2026-04-13 11:00:29 +02:00
requirements-cublas12-after.txt	fix(vllm): drop flash-attn wheel to avoid torch 2.10 ABI mismatch (#9557 )	2026-04-25 15:38:13 +00:00
requirements-cublas12.txt	fix(vllm): drop flash-attn wheel to avoid torch 2.10 ABI mismatch (#9557 )	2026-04-25 15:38:13 +00:00
requirements-cublas13-after.txt	chore: ⬆️ Update vllm-project/vllm cu130 wheel to `0.20.1` (#9649 )	2026-05-05 08:41:55 +02:00
requirements-cublas13.txt	feat(backends): add CUDA 13 + L4T arm64 CUDA 13 variants for vllm/vllm-omni/sglang (#9553 )	2026-04-25 12:26:29 +02:00
requirements-hipblas-after.txt	feat(vllm): parity with llama.cpp backend (#9328 )	2026-04-13 11:00:29 +02:00
requirements-hipblas.txt	feat(rocm): bump to 7.x (#9323 )	2026-04-12 08:51:30 +02:00
requirements-install.txt	fix(vllm): seed pybind11 for fastsafetensors build under --no-build-isolation	2026-04-28 20:08:26 +00:00
requirements-intel-after.txt	feat(vllm, distributed): tensor parallel distributed workers (#9612 )	2026-05-06 00:22:50 +02:00
requirements-intel.txt	feat(vllm, distributed): tensor parallel distributed workers (#9612 )	2026-05-06 00:22:50 +02:00
requirements.txt	feat(vllm, distributed): tensor parallel distributed workers (#9612 )	2026-05-06 00:22:50 +02:00
run.sh	fix(python-backend): make JIT subprocesses work on hosts of any size (#9679 )	2026-05-06 00:28:01 +02:00
test.py	feat(vllm): expose AsyncEngineArgs via generic engine_args YAML map (#9563 )	2026-04-29 00:49:28 +02:00
test.sh	feat: Add backend gallery (#5607 )	2025-06-15 14:56:52 +02:00

README.md

Creating a separate environment for the vllm project

make vllm