mirror of
https://github.com/mudler/LocalAI
synced 2026-04-21 13:27:21 +00:00
Both ubuntu-latest and bigger-runner have inconsistent CPU baselines: some instances support the AVX-512 VNNI/BF16 instructions the prebuilt vllm 0.14.1+cpu wheel was compiled with, others SIGILL on import of vllm.model_executor.models.registry. The libnuma packaging fix doesn't help when the wheel itself can't be loaded. FROM_SOURCE=true compiles vllm against the actual host CPU and works everywhere, but takes 30-50 minutes per run — too slow for a smoke test on every PR. Comment out the job for now. The test itself is intact and passes locally; run it via 'make test-extra-backend-vllm' on a host with the required SIMD baseline. Re-enable when: - we have a self-hosted runner label with guaranteed AVX-512 VNNI/BF16, or - vllm publishes a CPU wheel with a wider baseline, or - we set up a docker layer cache that makes FROM_SOURCE acceptable The detect-changes vllm output, the test harness changes (tests/ e2e-backends + tools cap), the make target (test-extra-backend-vllm), the package.sh and the Dockerfile/install.sh plumbing all stay in place. |
||
|---|---|---|
| .. | ||
| ci | ||
| gallery-agent | ||
| ISSUE_TEMPLATE | ||
| workflows | ||
| bump_deps.sh | ||
| bump_docs.sh | ||
| check_and_update.py | ||
| checksum_checker.sh | ||
| dependabot.yml | ||
| FUNDING.yml | ||
| labeler.yml | ||
| PULL_REQUEST_TEMPLATE.md | ||
| release.yml | ||
| stale.yml | ||