mirror of
https://github.com/mudler/LocalAI
synced 2026-05-24 09:28:23 +00:00
* ci(bump-deps): register ds4 + move version pin into the Makefile The initial ds4 PR (#9758) put the upstream commit pin in backend/cpp/ds4/prepare.sh as a shell variable. The auto-bump bot at .github/bump_deps.sh greps for ^$VAR?= in a Makefile, so DS4_VERSION was invisible to it - other backends (llama-cpp, ik-llama-cpp, turboquant, voxtral, etc.) all pin in their Makefile. This change: - Moves DS4_VERSION?= and DS4_REPO?= to the top of backend/cpp/ds4/Makefile. - Inlines the git init/fetch/checkout recipe into the 'ds4:' target (matches llama-cpp's 'llama.cpp:' target pattern). Directory acts as the target so make only re-clones when missing. - Deletes the now-redundant prepare.sh. - Adds antirez/ds4 + DS4_VERSION + main + backend/cpp/ds4/Makefile to the .github/workflows/bump_deps.yaml matrix so the daily bot opens PRs against this pin. - Updates .agents/ds4-backend.md to point at the Makefile. Verified: $ grep -m1 '^DS4_VERSION?=' backend/cpp/ds4/Makefile DS4_VERSION?=ae302c2fa18cc6d9aefc021d0f27ae03c9ad2fc0 $ make -C backend/cpp/ds4 ds4 # clones into ds4/ at the pin $ make -C backend/cpp/ds4 ds4 # no-op on second invocation make: 'ds4' is up to date. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * ci: route backend/cpp/ds4/ changes through changed-backends.js scripts/changed-backends.js:inferBackendPath has an explicit branch per cpp dockerfile suffix (ik-llama-cpp, turboquant, llama-cpp). Without a matching branch the function returns null, the backend never lands in the path map, and PR change-detection cannot map "backend/cpp/ds4/X changed" -> "rebuild ds4 image". This is why PR #9761 produced zero ds4 jobs even though it directly edits backend/cpp/ds4/Makefile. Adds the missing branch (Dockerfile.ds4 -> backend/cpp/ds4/), placed before the llama-cpp branch (since both share the .cpp ancestry but ds4 is more specific - same ordering rule documented in .agents/adding-backends.md). Verified with a local Node simulation of the script against this PR's diff: the path map now contains 'ds4 -> backend/cpp/ds4/' and a 'backend/cpp/ds4/Makefile' change correctly triggers the ds4 backend in the rebuild set. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * docs(adding-backends): harden the two gotchas that bit ds4 Both omissions are silent at the time you ADD a backend - the failure mode only appears later (the bump bot stays silent forever, or the path filter shows up on the next PR that touches your backend with zero CI jobs and looks broken for unrelated reasons). Expanding the `scripts/changed-backends.js` paragraph from a one-liner to a fully worked example, and adding a new sibling paragraph for the `bump_deps.yaml` + Makefile-pin contract. Both call out the specific mistakes from the ds4 timeline (#9758 → #9761) so future contributors can pattern-match on the cause. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
271 lines
17 KiB
Markdown
271 lines
17 KiB
Markdown
# Adding a New Backend
|
||
|
||
When adding a new backend to LocalAI, you need to update several files to ensure the backend is properly built, tested, and registered. Here's a step-by-step guide based on the pattern used for adding backends like `moonshine`:
|
||
|
||
## 1. Create Backend Directory Structure
|
||
|
||
Create the backend directory under the appropriate location:
|
||
- **Python backends**: `backend/python/<backend-name>/`
|
||
- **Go backends**: `backend/go/<backend-name>/`
|
||
- **C++ backends**: `backend/cpp/<backend-name>/`
|
||
- **Rust backends**: `backend/rust/<backend-name>/`
|
||
|
||
For Python backends, you'll typically need:
|
||
- `backend.py` - Main gRPC server implementation
|
||
- `Makefile` - Build configuration
|
||
- `install.sh` - Installation script for dependencies
|
||
- `protogen.sh` - Protocol buffer generation script
|
||
- `requirements.txt` - Python dependencies
|
||
- `run.sh` - Runtime script
|
||
- `test.py` / `test.sh` - Test files
|
||
|
||
For Rust backends, you'll typically need (see `backend/rust/kokoros/` as a reference):
|
||
- `Cargo.toml` - Crate manifest; depend on the upstream project as a submodule under `sources/`
|
||
- `build.rs` - Invokes `tonic_build` to generate gRPC stubs from `backend/backend.proto` (use the `BACKEND_PROTO_PATH` env var so the Makefile can inject the canonical copy)
|
||
- `src/` - The gRPC server implementation (implement `Backend` via `tonic`)
|
||
- `Makefile` - Copies `backend.proto` into the crate, runs `cargo build --release`, then `package.sh`
|
||
- `package.sh` - Uses `ldd` to bundle the binary's dynamic deps and `ld.so` into `package/lib/`
|
||
- `run.sh` - Sets `LD_LIBRARY_PATH`/`SSL_CERT_DIR` and execs the binary via the bundled `lib/ld.so`
|
||
- `sources/<UpstreamProject>/` - Git submodule with the upstream Rust crate
|
||
|
||
## 2. Add Build Configurations to `.github/backend-matrix.yml`
|
||
|
||
The build matrix is data-only YAML at `.github/backend-matrix.yml` (not inside `backend.yml` itself). `backend.yml` (master push) and `backend_pr.yml` (PR) load it via `scripts/changed-backends.js`, which also handles per-file path filtering so only touched backends rebuild on PRs and master pushes alike. Add build matrix entries to `.github/backend-matrix.yml` for each platform/GPU type you want to support. Look at similar backends for reference — `chatterbox`/`faster-whisper` for Python, `piper`/`silero-vad` for Go, `kokoros` for Rust.
|
||
|
||
**Without an entry here no image is ever built or pushed, and the gallery entry in `backend/index.yaml` will point at a tag that does not exist.** The `dockerfile:` field must point at `./backend/Dockerfile.<lang>` matching the language bucket from step 1 (e.g. `Dockerfile.python`, `Dockerfile.golang`, `Dockerfile.rust`). The `tag-suffix` must match the `uri:` in the corresponding `backend/index.yaml` image entry exactly.
|
||
|
||
**`scripts/changed-backends.js` registration — REQUIRED for any new dockerfile suffix.** This is the single most common omission, because it has no effect on the PR that adds the backend (when no prior path filter could catch it anyway) — it only breaks the *next* PR that touches your backend's directory, which then gets zero CI jobs and looks broken for unrelated reasons. Edit `scripts/changed-backends.js:inferBackendPath` and add a branch BEFORE the more-generic suffixes:
|
||
|
||
```js
|
||
if (item.dockerfile.endsWith("<your-dockerfile-suffix>")) {
|
||
return `backend/cpp/<your-backend>/`; // or backend/python|go|rust/...
|
||
}
|
||
```
|
||
|
||
The `endsWith()` test is against the matrix entry's `dockerfile:` value (e.g. `./backend/Dockerfile.ds4` → `endsWith("ds4")`). Specificity order matters here just like it does for importers: more-specific suffixes go BEFORE more-generic ones (e.g. `ds4` before `llama-cpp` even though both end with letters, because some upstream might one day call itself `super-ds4-llama-cpp`). Verify locally before pushing:
|
||
|
||
```bash
|
||
# Confirm your dockerfile suffix is unique enough
|
||
node -e "
|
||
const yaml = require('js-yaml'); const fs = require('fs');
|
||
const m = yaml.load(fs.readFileSync('.github/backend-matrix.yml','utf8'));
|
||
for (const e of m.include.filter(e => e.backend === '<your-backend>')) {
|
||
console.log(e.dockerfile, '->', e.dockerfile.endsWith('<suffix>'));
|
||
}"
|
||
```
|
||
|
||
A quick way to find the right insertion point: `grep -n 'item.dockerfile.endsWith' scripts/changed-backends.js`.
|
||
|
||
**`bump_deps.yaml` registration — REQUIRED for any backend pinning an upstream commit.** If your backend's Makefile has a `*_VERSION?=<sha>` pin to a third-party repo, the daily auto-bump bot at `.github/workflows/bump_deps.yaml` won't notice it unless you register the backend in its matrix. The bot runs `.github/bump_deps.sh` which `grep`s for `^$VAR?=` in the Makefile you list — so the pin MUST live in the Makefile (not in a separate shell script). The bump for ds4 (#9761) had to walk this back because the original landed the pin in `prepare.sh`, which the bot can't see. Pattern (for `antirez/ds4`):
|
||
|
||
```yaml
|
||
# .github/workflows/bump_deps.yaml
|
||
matrix:
|
||
include:
|
||
- repository: "antirez/ds4"
|
||
variable: "DS4_VERSION"
|
||
branch: "main"
|
||
file: "backend/cpp/ds4/Makefile"
|
||
```
|
||
|
||
And the corresponding Makefile shape (mirror `backend/cpp/llama-cpp/Makefile`):
|
||
|
||
```makefile
|
||
DS4_VERSION?=ae302c2fa18cc6d9aefc021d0f27ae03c9ad2fc0
|
||
DS4_REPO?=https://github.com/antirez/ds4
|
||
...
|
||
ds4:
|
||
mkdir -p ds4
|
||
cd ds4 && git init -q && \
|
||
git remote add origin $(DS4_REPO) && \
|
||
git fetch --depth 1 origin $(DS4_VERSION) && \
|
||
git checkout FETCH_HEAD
|
||
```
|
||
|
||
If you have a `prepare.sh` doing the clone, delete it — the recipe belongs in the Makefile target so `make purge && make` works as a clean-and-rebuild and so the bump bot finds the pin.
|
||
|
||
**Placement in file:**
|
||
- CPU builds: Add after other CPU builds (e.g., after `cpu-chatterbox`)
|
||
- CUDA 12 builds: Add after other CUDA 12 builds (e.g., after `gpu-nvidia-cuda-12-chatterbox`)
|
||
- CUDA 13 builds: Add after other CUDA 13 builds (e.g., after `gpu-nvidia-cuda-13-chatterbox`)
|
||
|
||
**Additional build types you may need:**
|
||
- ROCm/HIP: Use `build-type: 'hipblas'` with `base-image: "rocm/dev-ubuntu-24.04:7.2.1"`
|
||
- Intel/SYCL: Use `build-type: 'intel'` or `build-type: 'sycl_f16'`/`sycl_f32` with `base-image: "intel/oneapi-basekit:2025.3.2-0-devel-ubuntu24.04"`
|
||
- L4T (ARM): Use `build-type: 'l4t'` with `platforms: 'linux/arm64'` and `runs-on: 'ubuntu-24.04-arm'`
|
||
|
||
**Per-arch native builds (`linux/amd64` + `linux/arm64`):**
|
||
|
||
Multi-arch backends are NOT a single matrix entry with `platforms: 'linux/amd64,linux/arm64'`. Instead, add **two** entries — one with `platforms: 'linux/amd64'` + `platform-tag: 'amd64'` + `runs-on: 'ubuntu-latest'`, one with `platforms: 'linux/arm64'` + `platform-tag: 'arm64'` + `runs-on: 'ubuntu-24.04-arm'` — both sharing the same `tag-suffix`. The script detects the shared `tag-suffix` and emits a `merge-matrix` entry, so `backend-merge-jobs` (in `backend.yml`/`backend_pr.yml`) automatically assembles the manifest list from per-arch digest artifacts. See `-cpu-faster-whisper` in `.github/backend-matrix.yml` for a reference shape.
|
||
|
||
**llama-cpp / ik-llama-cpp / turboquant variants only — `builder-base-image`:**
|
||
|
||
Entries whose `dockerfile` is `./backend/Dockerfile.{llama-cpp,ik-llama-cpp,turboquant}` must also set a `builder-base-image` field pointing at a prebuilt base from `quay.io/go-skynet/ci-cache:base-grpc-*` (CI builds these via `.github/workflows/base-images.yml`). The mapping is by `(build-type, platforms)` — see existing entries for the pattern. CI uses these prebuilt bases to skip the gRPC compile (~25–35 min cold). Local `make backends/<name>` ignores `builder-base-image` and uses the from-source path inside the Dockerfile, so you don't need quay access for local builds.
|
||
|
||
## 3. Add Backend Metadata to `backend/index.yaml`
|
||
|
||
**Step 3a: Add Meta Definition**
|
||
|
||
Add a YAML anchor definition in the `## metas` section (around line 2-300). Look for similar backends to use as a template such as `diffusers` or `chatterbox`
|
||
|
||
**Step 3b: Add Image Entries**
|
||
|
||
Add image entries at the end of the file, following the pattern of similar backends such as `diffusers` or `chatterbox`. Include both `latest` (production) and `master` (development) tags.
|
||
|
||
## 4. Update the Makefile
|
||
|
||
The Makefile needs to be updated in several places to support building and testing the new backend:
|
||
|
||
**Step 4a: Add to `.NOTPARALLEL`**
|
||
|
||
Add `backends/<backend-name>` to the `.NOTPARALLEL` line (around line 2) to prevent parallel execution conflicts:
|
||
|
||
```makefile
|
||
.NOTPARALLEL: ... backends/<backend-name>
|
||
```
|
||
|
||
**Step 4b: Add to `prepare-test-extra`**
|
||
|
||
Add the backend to the `prepare-test-extra` target to prepare it for testing. Use the path matching your language bucket (`backend/python/`, `backend/go/`, `backend/rust/`, …):
|
||
|
||
```makefile
|
||
prepare-test-extra: protogen-python
|
||
...
|
||
$(MAKE) -C backend/<lang>/<backend-name>
|
||
```
|
||
|
||
For Rust backends the target is usually the crate build target itself (e.g. `$(MAKE) -C backend/rust/<backend-name> <backend-name>-grpc`) so the binary is in place before `test` runs.
|
||
|
||
**Step 4c: Add to `test-extra`**
|
||
|
||
Add the backend to the `test-extra` target to run its tests — applies to Go and Rust backends too, not only Python:
|
||
|
||
```makefile
|
||
test-extra: prepare-test-extra
|
||
...
|
||
$(MAKE) -C backend/<lang>/<backend-name> test
|
||
```
|
||
|
||
Each backend's own `Makefile` should define a `test` target so this line works regardless of language. Integration tests that need large model downloads should be gated behind an env var (see `backend/rust/kokoros/`'s `KOKOROS_MODEL_PATH` pattern) so CI only runs unit tests.
|
||
|
||
**Step 4d: Add Backend Definition**
|
||
|
||
Add a backend definition variable in the backend definitions section (around line 428-457). The format depends on the backend type:
|
||
|
||
**For Python backends with root context** (like `faster-whisper`, `coqui`):
|
||
```makefile
|
||
BACKEND_<BACKEND_NAME> = <backend-name>|python|.|false|true
|
||
```
|
||
|
||
**For Python backends with `./backend` context** (like `chatterbox`, `moonshine`):
|
||
```makefile
|
||
BACKEND_<BACKEND_NAME> = <backend-name>|python|./backend|false|true
|
||
```
|
||
|
||
**For Go backends**:
|
||
```makefile
|
||
BACKEND_<BACKEND_NAME> = <backend-name>|golang|.|false|true
|
||
```
|
||
|
||
**For Rust backends**:
|
||
```makefile
|
||
BACKEND_<BACKEND_NAME> = <backend-name>|rust|.|false|true
|
||
```
|
||
|
||
The language field (`python`/`golang`/`rust`/…) must match a `backend/Dockerfile.<lang>` file.
|
||
|
||
**Step 4e: Generate Docker Build Target**
|
||
|
||
Add an eval call to generate the docker-build target (around line 480-501):
|
||
|
||
```makefile
|
||
$(eval $(call generate-docker-build-target,$(BACKEND_<BACKEND_NAME>)))
|
||
```
|
||
|
||
**Step 4f: Add to `docker-build-backends`**
|
||
|
||
Add `docker-build-<backend-name>` to the `docker-build-backends` target (around line 507):
|
||
|
||
```makefile
|
||
docker-build-backends: ... docker-build-<backend-name>
|
||
```
|
||
|
||
**Determining the Context:**
|
||
|
||
- If the backend is in `backend/python/<backend-name>/` and uses `./backend` as context in the workflow file, use `./backend` context
|
||
- If the backend is in `backend/python/<backend-name>/` but uses `.` as context in the workflow file, use `.` context
|
||
- Check similar backends to determine the correct context
|
||
|
||
## 5. Verification Checklist
|
||
|
||
After adding a new backend, verify:
|
||
|
||
- [ ] Backend directory structure is complete with all necessary files
|
||
- [ ] Build configurations added to `.github/backend-matrix.yml` for all desired platforms (per-arch entries with `platform-tag` for multi-arch; `builder-base-image` for llama-cpp / ik-llama-cpp / turboquant)
|
||
- [ ] Meta definition added to `backend/index.yaml` in the `## metas` section
|
||
- [ ] Image entries added to `backend/index.yaml` for all build variants (latest + development)
|
||
- [ ] Tag suffixes match between workflow file and index.yaml
|
||
- [ ] Makefile updated with all 6 required changes (`.NOTPARALLEL`, `prepare-test-extra`, `test-extra`, backend definition, docker-build target eval, `docker-build-backends`)
|
||
- [ ] No YAML syntax errors (check with linter)
|
||
- [ ] No Makefile syntax errors (check with linter)
|
||
- [ ] Follows the same pattern as similar backends (e.g., if it's a transcription backend, follow `faster-whisper` pattern)
|
||
|
||
## Bundling runtime shared libraries (`package.sh`)
|
||
|
||
The final `Dockerfile.python` stage is `FROM scratch` — there is no system `libc`, no `apt`, no fallback library path. Only files explicitly copied from the builder stage end up in the backend image. That means any runtime `dlopen` your backend (or its Python deps) needs **must** be packaged into `${BACKEND}/lib/`.
|
||
|
||
Pattern:
|
||
|
||
1. Make sure the library is installed in the builder stage of `backend/Dockerfile.python` (add it to the top-level `apt-get install`).
|
||
2. Drop a `package.sh` in your backend directory that copies the library — and its soname symlinks — into `$(dirname $0)/lib`. See `backend/python/vllm/package.sh` for a reference implementation that walks `/usr/lib/x86_64-linux-gnu`, `/usr/lib/aarch64-linux-gnu`, etc.
|
||
3. `Dockerfile.python` already runs `package.sh` automatically if it exists, after `package-gpu-libs.sh`.
|
||
4. `libbackend.sh` automatically prepends `${EDIR}/lib` to `LD_LIBRARY_PATH` at run time, so anything packaged this way is found by `dlopen`.
|
||
|
||
How to find missing libs: when a Python module silently fails to register torch ops or you see `AttributeError: '_OpNamespace' '...' object has no attribute '...'`, run the backend image's Python with `LD_DEBUG=libs` to see which `dlopen` failed. The filename in the error message (e.g. `libnuma.so.1`) is what you need to package.
|
||
|
||
To verify packaging works without trusting the host:
|
||
|
||
```bash
|
||
make docker-build-<backend>
|
||
CID=$(docker create --entrypoint=/run.sh local-ai-backend:<backend>)
|
||
docker cp $CID:/lib /tmp/check && docker rm $CID
|
||
ls /tmp/check # expect the bundled .so files + symlinks
|
||
```
|
||
|
||
Then boot it inside a fresh `ubuntu:24.04` (which intentionally does *not* have the lib installed) to confirm it actually loads from the backend dir.
|
||
|
||
## Importer integration
|
||
|
||
When you add a new backend, you MUST also make it importable via the model import form (`/import-model`). The import form dropdown is sourced dynamically from `GET /backends/known` — it reads the importer registry at `core/gallery/importers/importers.go`, so the steps below are the ONLY way to make your backend show up.
|
||
|
||
Required steps:
|
||
|
||
1. **If your backend has unambiguous detection signals** (unique file extension, HF `pipeline_tag`, unique repo name pattern, unique artefact like `modules.json`):
|
||
- Create an importer file at `core/gallery/importers/<backend>.go` following the Match/Import pattern in `llama-cpp.go`.
|
||
- Register it in `importers.go:defaultImporters` in **specificity order** — more specific detectors must appear BEFORE more generic ones (e.g. `sentencetransformers` before `transformers`, `stablediffusion-ggml` before `llama-cpp`, `vllm-omni` before `vllm`). First match wins.
|
||
2. **If your backend is a drop-in replacement** (same artefacts as another backend, e.g. `ik-llama-cpp` and `turboquant` both consume GGUF the same way `llama-cpp` does):
|
||
- Do NOT create a new importer. Extend the existing importer's `Import()` to swap the emitted `backend:` field when `preferences.backend` matches. See `llama-cpp.go` for the pattern.
|
||
3. **If your backend has no reliable auto-detect signal** (preference-only — e.g. `sglang`, `tinygrad`, `whisperx`):
|
||
- Do NOT create an importer. Instead add the backend name to the curated pref-only slice in `core/http/endpoints/localai/backend.go` that feeds `/backends/known`. A single line addition.
|
||
4. **Always** add a table-driven test in `core/gallery/importers/importers_test.go` (Ginkgo/Gomega):
|
||
- Use a real public HuggingFace repo URI as the test fixture (existing tests already hit the live HF API — follow that pattern).
|
||
- Cover detection (auto-match without preferences), preference-override (explicit `backend:` in preferences wins), and — if the backend's modality has a common `pipeline_tag` but ambiguous artefacts — an ambiguity test asserting `errors.Is(err, importers.ErrAmbiguousImport)`.
|
||
|
||
Rules of thumb:
|
||
|
||
- When in doubt, lean pref-only. A wrong auto-detect is worse than a forced preference.
|
||
- Never silently emit a modality mismatch (e.g. emit `llama-cpp` for a TTS repo because `.gguf` is present). Return `ErrAmbiguousImport` instead.
|
||
- Registration order is the single most common source of bugs. Check by running `go test ./core/gallery/importers/...` — the existing suite will fail if you've shadowed a pre-existing detector.
|
||
|
||
## 6. Example: Adding a Python Backend
|
||
|
||
For reference, when `moonshine` was added:
|
||
- **Files created**: `backend/python/moonshine/{backend.py, Makefile, install.sh, protogen.sh, requirements.txt, run.sh, test.py, test.sh}`
|
||
- **Workflow entries**: 3 build configurations (CPU, CUDA 12, CUDA 13)
|
||
- **Index entries**: 1 meta definition + 6 image entries (cpu, cuda12, cuda13 x latest/development)
|
||
- **Makefile updates**:
|
||
- Added to `.NOTPARALLEL` line
|
||
- Added to `prepare-test-extra` and `test-extra` targets
|
||
- Added `BACKEND_MOONSHINE = moonshine|python|./backend|false|true`
|
||
- Added eval for docker-build target generation
|
||
- Added `docker-build-moonshine` to `docker-build-backends`
|