mirror of https://github.com/mudler/LocalAI synced 2026-05-24 09:28:23 +00:00

History

Richard Palethorpe 73aacad2f9 fix(vllm): drop flash-attn wheel to avoid torch 2.10 ABI mismatch (#9557 ) The pinned flash-attn 2.8.3+cu12torch2.7 wheel breaks at import time once vllm 0.19.1 upgrades torch to its hard-pinned 2.10.0: ImportError: .../flash_attn_2_cuda...so: undefined symbol: _ZN3c104cuda29c10_cuda_check_implementationEiPKcS2_ib That C10 CUDA symbol is libtorch-version-specific. Dao-AILab has not yet published flash-attn wheels for torch 2.10 -- the latest release (2.8.3) tops out at torch 2.8 -- so any wheel pinned here is silently ABI-broken the moment vllm completes its install. vllm 0.19.1 lists flashinfer-python==0.6.6 as a hard dep, which already covers the attention path. The only other use of flash-attn in vllm is the rotary apply_rotary import in vllm/model_executor/layers/rotary_embedding/common.py, which is guarded by find_spec("flash_attn") and falls back cleanly when absent. Also unpin torch in requirements-cublas12.txt: the 2.7.0 pin only existed to give the flash-attn wheel a matching torch to link against. With flash-attn gone, vllm's own torch==2.10.0 dep is the binding constraint regardless of what we put here. Assisted-by: Claude:claude-opus-4-7 [Claude Code] Signed-off-by: Richard Palethorpe <io@richiejp.com>		2026-04-25 15:38:13 +00:00
..
backend.py	feat(vllm): parity with llama.cpp backend (#9328 )	2026-04-13 11:00:29 +02:00
install.sh	feat(backends): add CUDA 13 + L4T arm64 CUDA 13 variants for vllm/vllm-omni/sglang (#9553 )	2026-04-25 12:26:29 +02:00
Makefile	feat(mlx): add mlx backend (#6049 )	2025-08-22 08:42:29 +02:00
package.sh	feat(vllm): parity with llama.cpp backend (#9328 )	2026-04-13 11:00:29 +02:00
README.md	refactor: move backends into the backends directory (#1279 )	2023-11-13 22:40:16 +01:00
requirements-after.txt	feat(vllm): parity with llama.cpp backend (#9328 )	2026-04-13 11:00:29 +02:00
requirements-cpu-after.txt	feat(vllm): parity with llama.cpp backend (#9328 )	2026-04-13 11:00:29 +02:00
requirements-cpu.txt	feat(vllm): parity with llama.cpp backend (#9328 )	2026-04-13 11:00:29 +02:00
requirements-cublas12-after.txt	fix(vllm): drop flash-attn wheel to avoid torch 2.10 ABI mismatch (#9557 )	2026-04-25 15:38:13 +00:00
requirements-cublas12.txt	fix(vllm): drop flash-attn wheel to avoid torch 2.10 ABI mismatch (#9557 )	2026-04-25 15:38:13 +00:00
requirements-cublas13-after.txt	feat(backends): add CUDA 13 + L4T arm64 CUDA 13 variants for vllm/vllm-omni/sglang (#9553 )	2026-04-25 12:26:29 +02:00
requirements-cublas13.txt	feat(backends): add CUDA 13 + L4T arm64 CUDA 13 variants for vllm/vllm-omni/sglang (#9553 )	2026-04-25 12:26:29 +02:00
requirements-hipblas-after.txt	feat(vllm): parity with llama.cpp backend (#9328 )	2026-04-13 11:00:29 +02:00
requirements-hipblas.txt	feat(rocm): bump to 7.x (#9323 )	2026-04-12 08:51:30 +02:00
requirements-install.txt	feat: migrate python backends from conda to uv (#2215 )	2024-05-10 15:08:08 +02:00
requirements-intel-after.txt	feat(vllm): parity with llama.cpp backend (#9328 )	2026-04-13 11:00:29 +02:00
requirements-intel.txt	feat(qwen-tts): add Qwen-tts backend (#8163 )	2026-01-23 15:18:41 +01:00
requirements-l4t13-after.txt	feat(backends): add CUDA 13 + L4T arm64 CUDA 13 variants for vllm/vllm-omni/sglang (#9553 )	2026-04-25 12:26:29 +02:00
requirements-l4t13.txt	feat(backends): add CUDA 13 + L4T arm64 CUDA 13 variants for vllm/vllm-omni/sglang (#9553 )	2026-04-25 12:26:29 +02:00
requirements.txt	chore(deps): bump grpcio from 1.78.1 to 1.80.0 in /backend/python/vllm (#9177 )	2026-03-31 10:10:17 +02:00
run.sh	feat: Add backend gallery (#5607 )	2025-06-15 14:56:52 +02:00
test.py	feat(vllm): parity with llama.cpp backend (#9328 )	2026-04-13 11:00:29 +02:00
test.sh	feat: Add backend gallery (#5607 )	2025-06-15 14:56:52 +02:00

README.md

Creating a separate environment for the vllm project

make vllm