mirror of
https://github.com/mudler/LocalAI
synced 2026-04-21 13:27:21 +00:00
New .agents/vllm-backend.md with everything that's easy to get wrong
on the vllm/vllm-omni backends:
- Use vLLM's native ToolParserManager / ReasoningParserManager — do
not write regex-based parsers. Selection is explicit via Options[],
defaults live in core/config/parser_defaults.json.
- Concrete parsers don't always accept the tools= kwarg the abstract
base declares; try/except TypeError is mandatory.
- ChatDelta.tool_calls is the contract — Reply.message text alone
won't surface tool calls in /v1/chat/completions.
- vllm version pin trap: 0.14.1+cpu pairs with torch 2.9.1+cpu.
Newer wheels declare torch==2.10.0+cpu which only exists on the
PyTorch test channel and pulls an incompatible torchvision.
- SIMD baseline: prebuilt wheel needs AVX-512 VNNI/BF16. SIGILL
symptom + FROM_SOURCE=true escape hatch are documented.
- libnuma.so.1 + libgomp.so.1 must be bundled because vllm._C
silently fails to register torch ops if they're missing.
- backend_hooks system: hooks_llamacpp / hooks_vllm split + the
'*' / '' / named-backend keys.
- ToProto() must serialize ToolCallID and Reasoning — easy to miss
when adding fields to schema.Message.
Also extended .agents/adding-backends.md with a generic 'Bundling
runtime shared libraries' section: Dockerfile.python is FROM scratch,
package.sh is the mechanism, libbackend.sh adds ${EDIR}/lib to
LD_LIBRARY_PATH, and how to verify packaging without trusting the
host (extract image, boot in fresh ubuntu container).
Index in AGENTS.md updated.
26 lines
2 KiB
Markdown
26 lines
2 KiB
Markdown
# LocalAI Agent Instructions
|
|
|
|
This file is an index to detailed topic guides in the `.agents/` directory. Read the relevant file(s) for the task at hand — you don't need to load all of them.
|
|
|
|
## Topics
|
|
|
|
| File | When to read |
|
|
|------|-------------|
|
|
| [.agents/building-and-testing.md](.agents/building-and-testing.md) | Building the project, running tests, Docker builds for specific platforms |
|
|
| [.agents/adding-backends.md](.agents/adding-backends.md) | Adding a new backend (Python, Go, or C++) — full step-by-step checklist |
|
|
| [.agents/coding-style.md](.agents/coding-style.md) | Code style, editorconfig, logging, documentation conventions |
|
|
| [.agents/llama-cpp-backend.md](.agents/llama-cpp-backend.md) | Working on the llama.cpp backend — architecture, updating, tool call parsing |
|
|
| [.agents/vllm-backend.md](.agents/vllm-backend.md) | Working on the vLLM / vLLM-omni backends — native parsers, ChatDelta, CPU build, libnuma packaging, backend hooks |
|
|
| [.agents/testing-mcp-apps.md](.agents/testing-mcp-apps.md) | Testing MCP Apps (interactive tool UIs) in the React UI |
|
|
| [.agents/api-endpoints-and-auth.md](.agents/api-endpoints-and-auth.md) | Adding API endpoints, auth middleware, feature permissions, user access control |
|
|
| [.agents/debugging-backends.md](.agents/debugging-backends.md) | Debugging runtime backend failures, dependency conflicts, rebuilding backends |
|
|
| [.agents/adding-gallery-models.md](.agents/adding-gallery-models.md) | Adding GGUF models from HuggingFace to the model gallery |
|
|
|
|
## Quick Reference
|
|
|
|
- **Logging**: Use `github.com/mudler/xlog` (same API as slog)
|
|
- **Go style**: Prefer `any` over `interface{}`
|
|
- **Comments**: Explain *why*, not *what*
|
|
- **Docs**: Update `docs/content/` when adding features or changing config
|
|
- **Build**: Inspect `Makefile` and `.github/workflows/` — ask the user before running long builds
|
|
- **UI**: The active UI is the React app in `core/http/react-ui/`. The older Alpine.js/HTML UI in `core/http/static/` is pending deprecation — all new UI work goes in the React UI
|