LocalAI/.agents
Ettore Di Giacinto daa0272f2e
docs(agents): capture vllm backend lessons + runtime lib packaging (#9333)
New .agents/vllm-backend.md with everything that's easy to get wrong
on the vllm/vllm-omni backends:

- Use vLLM's native ToolParserManager / ReasoningParserManager — do
  not write regex-based parsers. Selection is explicit via Options[],
  defaults live in core/config/parser_defaults.json.
- Concrete parsers don't always accept the tools= kwarg the abstract
  base declares; try/except TypeError is mandatory.
- ChatDelta.tool_calls is the contract — Reply.message text alone
  won't surface tool calls in /v1/chat/completions.
- vllm version pin trap: 0.14.1+cpu pairs with torch 2.9.1+cpu.
  Newer wheels declare torch==2.10.0+cpu which only exists on the
  PyTorch test channel and pulls an incompatible torchvision.
- SIMD baseline: prebuilt wheel needs AVX-512 VNNI/BF16. SIGILL
  symptom + FROM_SOURCE=true escape hatch are documented.
- libnuma.so.1 + libgomp.so.1 must be bundled because vllm._C
  silently fails to register torch ops if they're missing.
- backend_hooks system: hooks_llamacpp / hooks_vllm split + the
  '*' / '' / named-backend keys.
- ToProto() must serialize ToolCallID and Reasoning — easy to miss
  when adding fields to schema.Message.

Also extended .agents/adding-backends.md with a generic 'Bundling
runtime shared libraries' section: Dockerfile.python is FROM scratch,
package.sh is the mechanism, libbackend.sh adds ${EDIR}/lib to
LD_LIBRARY_PATH, and how to verify packaging without trusting the
host (extract image, boot in fresh ubuntu container).

Index in AGENTS.md updated.
2026-04-13 11:09:57 +02:00
..
adding-backends.md docs(agents): capture vllm backend lessons + runtime lib packaging (#9333) 2026-04-13 11:09:57 +02:00
adding-gallery-models.md chore: add embeddingemma 2026-04-08 17:40:55 +00:00
api-endpoints-and-auth.md chore(agents.md): update with auth/feature gating instructions 2026-03-19 22:52:28 +00:00
building-and-testing.md feat(rocm): bump to 7.x (#9323) 2026-04-12 08:51:30 +02:00
coding-style.md fix(docs): Use notice instead of alert (#9134) 2026-03-25 13:55:48 +01:00
debugging-backends.md feat: add (experimental) fine-tuning support with TRL (#9088) 2026-03-21 02:08:02 +01:00
llama-cpp-backend.md feat(ui): MCP Apps, mcp streaming and client-side support (#8947) 2026-03-11 07:30:49 +01:00
testing-mcp-apps.md feat(ui): MCP Apps, mcp streaming and client-side support (#8947) 2026-03-11 07:30:49 +01:00
vllm-backend.md docs(agents): capture vllm backend lessons + runtime lib packaging (#9333) 2026-04-13 11:09:57 +02:00