LocalAI

mirror of https://github.com/mudler/LocalAI synced 2026-04-21 13:27:21 +00:00

History

Ettore Di Giacinto daa0272f2e docs(agents): capture vllm backend lessons + runtime lib packaging (#9333 ) New .agents/vllm-backend.md with everything that's easy to get wrong on the vllm/vllm-omni backends: - Use vLLM's native ToolParserManager / ReasoningParserManager — do not write regex-based parsers. Selection is explicit via Options[], defaults live in core/config/parser_defaults.json. - Concrete parsers don't always accept the tools= kwarg the abstract base declares; try/except TypeError is mandatory. - ChatDelta.tool_calls is the contract — Reply.message text alone won't surface tool calls in /v1/chat/completions. - vllm version pin trap: 0.14.1+cpu pairs with torch 2.9.1+cpu. Newer wheels declare torch==2.10.0+cpu which only exists on the PyTorch test channel and pulls an incompatible torchvision. - SIMD baseline: prebuilt wheel needs AVX-512 VNNI/BF16. SIGILL symptom + FROM_SOURCE=true escape hatch are documented. - libnuma.so.1 + libgomp.so.1 must be bundled because vllm._C silently fails to register torch ops if they're missing. - backend_hooks system: hooks_llamacpp / hooks_vllm split + the '*' / '' / named-backend keys. - ToProto() must serialize ToolCallID and Reasoning — easy to miss when adding fields to schema.Message. Also extended .agents/adding-backends.md with a generic 'Bundling runtime shared libraries' section: Dockerfile.python is FROM scratch, package.sh is the mechanism, libbackend.sh adds ${EDIR}/lib to LD_LIBRARY_PATH, and how to verify packaging without trusting the host (extract image, boot in fresh ubuntu container). Index in AGENTS.md updated.		2026-04-13 11:09:57 +02:00
..
adding-backends.md	docs(agents): capture vllm backend lessons + runtime lib packaging (#9333 )	2026-04-13 11:09:57 +02:00
adding-gallery-models.md	chore: add embeddingemma	2026-04-08 17:40:55 +00:00
api-endpoints-and-auth.md	chore(agents.md): update with auth/feature gating instructions	2026-03-19 22:52:28 +00:00
building-and-testing.md	feat(rocm): bump to 7.x (#9323 )	2026-04-12 08:51:30 +02:00
coding-style.md	fix(docs): Use notice instead of alert (#9134 )	2026-03-25 13:55:48 +01:00
debugging-backends.md	feat: add (experimental) fine-tuning support with TRL (#9088 )	2026-03-21 02:08:02 +01:00
llama-cpp-backend.md	feat(ui): MCP Apps, mcp streaming and client-side support (#8947 )	2026-03-11 07:30:49 +01:00
testing-mcp-apps.md	feat(ui): MCP Apps, mcp streaming and client-side support (#8947 )	2026-03-11 07:30:49 +01:00
vllm-backend.md	docs(agents): capture vllm backend lessons + runtime lib packaging (#9333 )	2026-04-13 11:09:57 +02:00