New .agents/vllm-backend.md with everything that's easy to get wrong
on the vllm/vllm-omni backends:
- Use vLLM's native ToolParserManager / ReasoningParserManager — do
not write regex-based parsers. Selection is explicit via Options[],
defaults live in core/config/parser_defaults.json.
- Concrete parsers don't always accept the tools= kwarg the abstract
base declares; try/except TypeError is mandatory.
- ChatDelta.tool_calls is the contract — Reply.message text alone
won't surface tool calls in /v1/chat/completions.
- vllm version pin trap: 0.14.1+cpu pairs with torch 2.9.1+cpu.
Newer wheels declare torch==2.10.0+cpu which only exists on the
PyTorch test channel and pulls an incompatible torchvision.
- SIMD baseline: prebuilt wheel needs AVX-512 VNNI/BF16. SIGILL
symptom + FROM_SOURCE=true escape hatch are documented.
- libnuma.so.1 + libgomp.so.1 must be bundled because vllm._C
silently fails to register torch ops if they're missing.
- backend_hooks system: hooks_llamacpp / hooks_vllm split + the
'*' / '' / named-backend keys.
- ToProto() must serialize ToolCallID and Reasoning — easy to miss
when adding fields to schema.Message.
Also extended .agents/adding-backends.md with a generic 'Bundling
runtime shared libraries' section: Dockerfile.python is FROM scratch,
package.sh is the mechanism, libbackend.sh adds ${EDIR}/lib to
LD_LIBRARY_PATH, and how to verify packaging without trusting the
host (extract image, boot in fresh ubuntu container).
Index in AGENTS.md updated.
7.5 KiB
Adding a New Backend
When adding a new backend to LocalAI, you need to update several files to ensure the backend is properly built, tested, and registered. Here's a step-by-step guide based on the pattern used for adding backends like moonshine:
1. Create Backend Directory Structure
Create the backend directory under the appropriate location:
- Python backends:
backend/python/<backend-name>/ - Go backends:
backend/go/<backend-name>/ - C++ backends:
backend/cpp/<backend-name>/
For Python backends, you'll typically need:
backend.py- Main gRPC server implementationMakefile- Build configurationinstall.sh- Installation script for dependenciesprotogen.sh- Protocol buffer generation scriptrequirements.txt- Python dependenciesrun.sh- Runtime scripttest.py/test.sh- Test files
2. Add Build Configurations to .github/workflows/backend.yml
Add build matrix entries for each platform/GPU type you want to support. Look at similar backends (e.g., chatterbox, faster-whisper) for reference.
Placement in file:
- CPU builds: Add after other CPU builds (e.g., after
cpu-chatterbox) - CUDA 12 builds: Add after other CUDA 12 builds (e.g., after
gpu-nvidia-cuda-12-chatterbox) - CUDA 13 builds: Add after other CUDA 13 builds (e.g., after
gpu-nvidia-cuda-13-chatterbox)
Additional build types you may need:
- ROCm/HIP: Use
build-type: 'hipblas'withbase-image: "rocm/dev-ubuntu-24.04:7.2.1" - Intel/SYCL: Use
build-type: 'intel'orbuild-type: 'sycl_f16'/sycl_f32withbase-image: "intel/oneapi-basekit:2025.3.0-0-devel-ubuntu24.04" - L4T (ARM): Use
build-type: 'l4t'withplatforms: 'linux/arm64'andruns-on: 'ubuntu-24.04-arm'
3. Add Backend Metadata to backend/index.yaml
Step 3a: Add Meta Definition
Add a YAML anchor definition in the ## metas section (around line 2-300). Look for similar backends to use as a template such as diffusers or chatterbox
Step 3b: Add Image Entries
Add image entries at the end of the file, following the pattern of similar backends such as diffusers or chatterbox. Include both latest (production) and master (development) tags.
4. Update the Makefile
The Makefile needs to be updated in several places to support building and testing the new backend:
Step 4a: Add to .NOTPARALLEL
Add backends/<backend-name> to the .NOTPARALLEL line (around line 2) to prevent parallel execution conflicts:
.NOTPARALLEL: ... backends/<backend-name>
Step 4b: Add to prepare-test-extra
Add the backend to the prepare-test-extra target (around line 312) to prepare it for testing:
prepare-test-extra: protogen-python
...
$(MAKE) -C backend/python/<backend-name>
Step 4c: Add to test-extra
Add the backend to the test-extra target (around line 319) to run its tests:
test-extra: prepare-test-extra
...
$(MAKE) -C backend/python/<backend-name> test
Step 4d: Add Backend Definition
Add a backend definition variable in the backend definitions section (around line 428-457). The format depends on the backend type:
For Python backends with root context (like faster-whisper, coqui):
BACKEND_<BACKEND_NAME> = <backend-name>|python|.|false|true
For Python backends with ./backend context (like chatterbox, moonshine):
BACKEND_<BACKEND_NAME> = <backend-name>|python|./backend|false|true
For Go backends:
BACKEND_<BACKEND_NAME> = <backend-name>|golang|.|false|true
Step 4e: Generate Docker Build Target
Add an eval call to generate the docker-build target (around line 480-501):
$(eval $(call generate-docker-build-target,$(BACKEND_<BACKEND_NAME>)))
Step 4f: Add to docker-build-backends
Add docker-build-<backend-name> to the docker-build-backends target (around line 507):
docker-build-backends: ... docker-build-<backend-name>
Determining the Context:
- If the backend is in
backend/python/<backend-name>/and uses./backendas context in the workflow file, use./backendcontext - If the backend is in
backend/python/<backend-name>/but uses.as context in the workflow file, use.context - Check similar backends to determine the correct context
5. Verification Checklist
After adding a new backend, verify:
- Backend directory structure is complete with all necessary files
- Build configurations added to
.github/workflows/backend.ymlfor all desired platforms - Meta definition added to
backend/index.yamlin the## metassection - Image entries added to
backend/index.yamlfor all build variants (latest + development) - Tag suffixes match between workflow file and index.yaml
- Makefile updated with all 6 required changes (
.NOTPARALLEL,prepare-test-extra,test-extra, backend definition, docker-build target eval,docker-build-backends) - No YAML syntax errors (check with linter)
- No Makefile syntax errors (check with linter)
- Follows the same pattern as similar backends (e.g., if it's a transcription backend, follow
faster-whisperpattern)
Bundling runtime shared libraries (package.sh)
The final Dockerfile.python stage is FROM scratch — there is no system libc, no apt, no fallback library path. Only files explicitly copied from the builder stage end up in the backend image. That means any runtime dlopen your backend (or its Python deps) needs must be packaged into ${BACKEND}/lib/.
Pattern:
- Make sure the library is installed in the builder stage of
backend/Dockerfile.python(add it to the top-levelapt-get install). - Drop a
package.shin your backend directory that copies the library — and its soname symlinks — into$(dirname $0)/lib. Seebackend/python/vllm/package.shfor a reference implementation that walks/usr/lib/x86_64-linux-gnu,/usr/lib/aarch64-linux-gnu, etc. Dockerfile.pythonalready runspackage.shautomatically if it exists, afterpackage-gpu-libs.sh.libbackend.shautomatically prepends${EDIR}/libtoLD_LIBRARY_PATHat run time, so anything packaged this way is found bydlopen.
How to find missing libs: when a Python module silently fails to register torch ops or you see AttributeError: '_OpNamespace' '...' object has no attribute '...', run the backend image's Python with LD_DEBUG=libs to see which dlopen failed. The filename in the error message (e.g. libnuma.so.1) is what you need to package.
To verify packaging works without trusting the host:
make docker-build-<backend>
CID=$(docker create --entrypoint=/run.sh local-ai-backend:<backend>)
docker cp $CID:/lib /tmp/check && docker rm $CID
ls /tmp/check # expect the bundled .so files + symlinks
Then boot it inside a fresh ubuntu:24.04 (which intentionally does not have the lib installed) to confirm it actually loads from the backend dir.
6. Example: Adding a Python Backend
For reference, when moonshine was added:
- Files created:
backend/python/moonshine/{backend.py, Makefile, install.sh, protogen.sh, requirements.txt, run.sh, test.py, test.sh} - Workflow entries: 3 build configurations (CPU, CUDA 12, CUDA 13)
- Index entries: 1 meta definition + 6 image entries (cpu, cuda12, cuda13 x latest/development)
- Makefile updates:
- Added to
.NOTPARALLELline - Added to
prepare-test-extraandtest-extratargets - Added
BACKEND_MOONSHINE = moonshine|python|./backend|false|true - Added eval for docker-build target generation
- Added
docker-build-moonshinetodocker-build-backends
- Added to