mirror of
https://github.com/mudler/LocalAI
synced 2026-05-24 09:28:23 +00:00
Some checks failed
build backend container images / generate-matrix (push) Waiting to run
build backend container images / backend-jobs (push) Blocked by required conditions
build backend container images / backend-merge-jobs (push) Blocked by required conditions
build backend container images / backend-jobs-darwin (push) Blocked by required conditions
Build test / build-test (push) Waiting to run
Build test / launcher-build-darwin (push) Waiting to run
Build test / launcher-build-linux (push) Waiting to run
Explorer deployment / build-linux (push) Waiting to run
GPU tests / ubuntu-latest (1.21.x) (push) Waiting to run
generate and publish intel docker caches / generate_caches (intel/oneapi-basekit:2025.3.2-0-devel-ubuntu24.04, linux/amd64, arc-runner-set) (push) Waiting to run
build container images / hipblas-jobs (rocm/dev-ubuntu-24.04:7.2.1, hipblas, --jobs=3 --output-sync=target, linux/amd64, ubuntu-latest, auto, -gpu-hipblas, noble, 2404) (push) Waiting to run
build container images / core-image-build (intel/oneapi-basekit:2025.3.2-0-devel-ubuntu24.04, intel, --jobs=3 --output-sync=target, linux/amd64, ubuntu-latest, auto, -gpu-intel, noble, 2404) (push) Waiting to run
build container images / core-image-build (ubuntu:22.04, cublas, 13, 0, --jobs=4 --output-sync=target, linux/amd64, ubuntu-latest, false, auto, -gpu-nvidia-cuda-13, noble, 2404) (push) Waiting to run
build container images / core-image-build (ubuntu:24.04, , --jobs=4 --output-sync=target, amd64, linux/amd64, ubuntu-latest, false, auto, , noble, 2404) (push) Waiting to run
build container images / core-image-build (ubuntu:24.04, , --jobs=4 --output-sync=target, arm64, linux/arm64, ubuntu-24.04-arm, false, auto, , noble, 2404) (push) Waiting to run
build container images / core-image-build (ubuntu:24.04, cublas, 12, 8, --jobs=4 --output-sync=target, linux/amd64, ubuntu-latest, false, auto, -gpu-nvidia-cuda-12, noble, 2404) (push) Waiting to run
build container images / core-image-build (ubuntu:24.04, vulkan, --jobs=4 --output-sync=target, amd64, linux/amd64, ubuntu-latest, false, auto, -gpu-vulkan, noble, 2404) (push) Waiting to run
build container images / core-image-build (ubuntu:24.04, vulkan, --jobs=4 --output-sync=target, arm64, linux/arm64, ubuntu-24.04-arm, false, auto, -gpu-vulkan, noble, 2404) (push) Waiting to run
build container images / core-image-merge (push) Blocked by required conditions
build container images / gpu-vulkan-image-merge (push) Blocked by required conditions
build container images / gh-runner (nvcr.io/nvidia/l4t-jetpack:r36.4.0, cublas, 12, 0, --jobs=4 --output-sync=target, linux/arm64, ubuntu-24.04-arm, true, auto, -nvidia-l4t-arm64, jammy, 2204) (push) Waiting to run
build container images / gh-runner (ubuntu:24.04, cublas, 13, 0, --jobs=4 --output-sync=target, linux/arm64, ubuntu-24.04-arm, false, auto, -nvidia-l4t-arm64-cuda-13, noble, 2404) (push) Waiting to run
lint / golangci-lint (push) Waiting to run
Security Scan / tests (push) Waiting to run
Tests extras backends / detect-changes (push) Waiting to run
Tests extras backends / tests-transformers (push) Blocked by required conditions
Tests extras backends / tests-rerankers (push) Blocked by required conditions
Tests extras backends / tests-diffusers (push) Blocked by required conditions
Tests extras backends / tests-coqui (push) Blocked by required conditions
Tests extras backends / tests-moonshine (push) Blocked by required conditions
Tests extras backends / tests-pocket-tts (push) Blocked by required conditions
Tests extras backends / tests-qwen-tts (push) Blocked by required conditions
Tests extras backends / tests-qwen-asr (push) Blocked by required conditions
Tests extras backends / tests-nemo (push) Blocked by required conditions
Tests extras backends / tests-voxcpm (push) Blocked by required conditions
Tests extras backends / tests-llama-cpp-quantization (push) Blocked by required conditions
Tests extras backends / tests-llama-cpp-grpc (push) Blocked by required conditions
Tests extras backends / tests-llama-cpp-grpc-transcription (push) Blocked by required conditions
Tests extras backends / tests-llama-cpp-smoke (push) Waiting to run
Tests extras backends / tests-sherpa-onnx-realtime (push) Blocked by required conditions
Tests extras backends / tests-sherpa-onnx-grpc-transcription (push) Blocked by required conditions
Tests extras backends / tests-sherpa-onnx-grpc-tts (push) Blocked by required conditions
Tests extras backends / tests-ik-llama-cpp-grpc (push) Blocked by required conditions
Tests extras backends / tests-turboquant-grpc (push) Blocked by required conditions
Tests extras backends / tests-acestep-cpp (push) Blocked by required conditions
Tests extras backends / tests-qwen3-tts-cpp (push) Blocked by required conditions
Tests extras backends / tests-vibevoice-cpp (push) Blocked by required conditions
Tests extras backends / tests-vibevoice-cpp-grpc-tts (push) Blocked by required conditions
Tests extras backends / tests-vibevoice-cpp-grpc-transcription (push) Blocked by required conditions
Tests extras backends / tests-localvqe-grpc-transform (push) Blocked by required conditions
Tests extras backends / tests-voxtral (push) Blocked by required conditions
Tests extras backends / tests-kokoros (push) Blocked by required conditions
Tests extras backends / tests-insightface-grpc (push) Blocked by required conditions
Tests extras backends / tests-speaker-recognition-grpc (push) Blocked by required conditions
tests / tests-linux (1.26.x) (push) Waiting to run
tests / tests-apple (1.26.x) (push) Waiting to run
tests-aio / tests-aio (push) Waiting to run
E2E Backend Tests / tests-e2e-backend (1.25.x) (push) Waiting to run
UI E2E Tests / tests-ui-e2e (1.26.x) (push) Waiting to run
build base-grpc images / build (intel/oneapi-basekit:2025.3.2-0-devel-ubuntu24.04, sycl, , , ubuntu-latest, base-grpc-intel-amd64, 2404) (push) Has been cancelled
build base-grpc images / build (nvcr.io/nvidia/l4t-jetpack:r36.4.0, l4t, 12, 0, ubuntu-24.04-arm, true, base-grpc-l4t-cuda-12-arm64, 2204) (push) Has been cancelled
build base-grpc images / build (rocm/dev-ubuntu-24.04:7.2.1, hipblas, , , ubuntu-latest, base-grpc-rocm-amd64, 2404) (push) Has been cancelled
build base-grpc images / build (ubuntu:22.04, cublas, 13, 0, ubuntu-latest, base-grpc-cuda-13-amd64, 2204) (push) Has been cancelled
build base-grpc images / build (ubuntu:24.04, , , , ubuntu-24.04-arm, base-grpc-arm64, 2404) (push) Has been cancelled
build base-grpc images / build (ubuntu:24.04, , , , ubuntu-latest, base-grpc-amd64, 2404) (push) Has been cancelled
build base-grpc images / build (ubuntu:24.04, cublas, 12, 8, ubuntu-latest, base-grpc-cuda-12-amd64, 2404) (push) Has been cancelled
build base-grpc images / build (ubuntu:24.04, cublas, 13, 0, ubuntu-24.04-arm, base-grpc-cuda-13-arm64, 2404) (push) Has been cancelled
build base-grpc images / build (ubuntu:24.04, vulkan, , , ubuntu-24.04-arm, base-grpc-vulkan-arm64, 2404) (push) Has been cancelled
build base-grpc images / build (ubuntu:24.04, vulkan, , , ubuntu-latest, base-grpc-vulkan-amd64, 2404) (push) Has been cancelled
Now that base-images.yml's master-push trigger includes the install
script and apt-mirror script (commit 7fff8584), enumerate them in
the workflow-surfaces table instead of the vaguer "base Dockerfile/
workflow" phrasing.
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
250 lines
21 KiB
Markdown
250 lines
21 KiB
Markdown
# CI Build Caching
|
||
|
||
Container builds — both the root LocalAI image (`Dockerfile`) and the per-backend images (`backend/Dockerfile.*`) — share a registry-backed BuildKit cache plus a layered set of prebuilt base images. This file explains how the cache is laid out, what invalidates it, and how to bypass it.
|
||
|
||
## Workflow surfaces
|
||
|
||
| Workflow | Purpose | Triggers |
|
||
|---|---|---|
|
||
| `.github/workflows/backend.yml` | Backend container images on master | `push` to master + tags, weekly Sunday cron, `workflow_dispatch` |
|
||
| `.github/workflows/backend_pr.yml` | Backend container images on PRs | `pull_request` |
|
||
| `.github/workflows/backend_build.yml` | Reusable: builds one backend (one arch) by digest | `workflow_call` from above |
|
||
| `.github/workflows/backend_merge.yml` | Reusable: assembles per-arch digests into a multi-arch manifest list | `workflow_call` |
|
||
| `.github/workflows/backend_build_darwin.yml` | Reusable: macOS-native backend builds | `workflow_call` |
|
||
| `.github/workflows/image.yml` / `image-pr.yml` | Root LocalAI image (push / PR) | push / PR |
|
||
| `.github/workflows/image_build.yml` / `image_merge.yml` | Reusable: per-arch root-image build + merge | `workflow_call` |
|
||
| `.github/workflows/base-images.yml` | Builds the prebuilt `base-grpc-*` builder bases | Saturdays 05:00 UTC cron, `workflow_dispatch`, master push touching `Dockerfile.base-grpc-builder`, `.docker/install-base-deps.sh`, `.docker/apt-mirror.sh`, or this workflow |
|
||
|
||
The matrix that drives `backend.yml` / `backend_pr.yml` lives in **`.github/backend-matrix.yml`** (data-only YAML, not embedded in the workflow). `scripts/changed-backends.js` parses it, applies path-filter logic against the PR diff (PR events) or the GitHub Compare API (push events), and emits the filtered matrix plus a `merge-matrix` for backends with multiple per-arch entries.
|
||
|
||
## Cache layout
|
||
|
||
- **Cache registry**: `quay.io/go-skynet/ci-cache`
|
||
- **One tag per matrix entry per arch**, derived from `tag-suffix` and `platform-tag`:
|
||
- Backend builds (`backend_build.yml`): `cache<tag-suffix>-<platform-tag>`
|
||
- e.g. `cache-cpu-faster-whisper-amd64`, `cache-cpu-faster-whisper-arm64`, `cache-gpu-nvidia-cuda-13-llama-cpp-amd64`
|
||
- Root image builds (`image_build.yml`): `cache-localai<tag-suffix>-<platform-tag>` (with a `-core` placeholder when `tag-suffix` is empty, so `cache-localai-core-amd64` for the core image)
|
||
- Pre-built base images (`base-images.yml`): `cache-base-grpc-<variant>` (one per `(BUILD_TYPE, arch)` permutation)
|
||
- Each tag stores a multi-arch BuildKit cache manifest (`mode=max`), so every intermediate stage is re-usable, not just the final image.
|
||
|
||
The per-arch suffix exists because amd64 and arm64 builds produce different intermediate content; sharing one cache key would thrash on every cross-arch rebuild.
|
||
|
||
## Read/write semantics
|
||
|
||
| Trigger | `cache-from` | `cache-to` |
|
||
|---|---|---|
|
||
| `push` to `master` / tag / cron / dispatch | yes | yes (`mode=max,ignore-error=true`) |
|
||
| `pull_request` | yes | **no** |
|
||
|
||
PR builds read master's warm cache but never write — this prevents PRs from polluting the shared cache with their experimental state. After merge, the master build for that matrix entry refreshes the cache.
|
||
|
||
`ignore-error=true` on the write side means a transient quay push failure does not fail the build; the next master push retries.
|
||
|
||
## Pre-built base images (`base-grpc-*`)
|
||
|
||
The C++ backend Dockerfiles (`Dockerfile.{llama-cpp,ik-llama-cpp,turboquant}`) compile gRPC from source. On a cold build that's ~25–35 min before any LocalAI source compiles. To skip that on CI, `.github/workflows/base-images.yml` builds and pushes a set of pre-prepped builder bases:
|
||
|
||
| Tag | Contents |
|
||
|---|---|
|
||
| `base-grpc-amd64` / `base-grpc-arm64` | Ubuntu 24.04 + apt build deps + protoc + cmake + gRPC at `/opt/grpc` |
|
||
| `base-grpc-cuda-12-amd64` | the above + CUDA 12.8 toolkit |
|
||
| `base-grpc-cuda-13-amd64` | the above + CUDA 13.0 toolkit (Ubuntu 22.04 base) |
|
||
| `base-grpc-cuda-13-arm64` | the above + CUDA 13.0 sbsa toolkit (Ubuntu 24.04 base) |
|
||
| `base-grpc-l4t-cuda-12-arm64` | JetPack r36.4.0 base (CUDA preinstalled, `SKIP_DRIVERS=true`) + gRPC |
|
||
| `base-grpc-rocm-amd64` | rocm/dev-ubuntu-24.04:7.2.1 base + hipblas/hipblaslt/rocblas + gRPC |
|
||
| `base-grpc-vulkan-amd64` / `base-grpc-vulkan-arm64` | Ubuntu 24.04 + Vulkan SDK 1.4.335 + gRPC |
|
||
| `base-grpc-intel-amd64` | intel/oneapi-basekit:2025.3.2 base + gRPC |
|
||
|
||
**Single source of truth**: the install logic for all 10 variants lives in `.docker/install-base-deps.sh`. Both `Dockerfile.base-grpc-builder` AND each variant Dockerfile's `builder-fromsource` stage bind-mount and execute the same script — so the prebuilt CI base and the local from-source path are bit-equivalent by construction.
|
||
|
||
### How variant Dockerfiles consume the base
|
||
|
||
`Dockerfile.{llama-cpp,ik-llama-cpp,turboquant}` are multi-target. Three stages plus a final aliasing stage:
|
||
|
||
- `builder-fromsource` — `FROM ${BASE_IMAGE}` then runs `install-base-deps.sh` and the per-backend compile script. Used when `BUILDER_TARGET=builder-fromsource` (the default; local `make backends/<name>`).
|
||
- `builder-prebuilt` — `FROM ${BUILDER_BASE_IMAGE}` (one of the prebuilt `base-grpc-*` tags) and runs only the per-backend compile script. Used when `BUILDER_TARGET=builder-prebuilt` (CI when the matrix entry sets `builder-base-image`).
|
||
- `FROM ${BUILDER_TARGET} AS builder` — alias resolves the ARG-selected stage to a fixed name (BuildKit doesn't allow ARG expansion in `COPY --from=`).
|
||
- `FROM scratch` + `COPY --from=builder ...package/. ./` — emits the final scratch image with just the package contents.
|
||
|
||
BuildKit prunes the unreferenced builder stage, so each build only runs the path it needs. `backend_build.yml` derives `BUILDER_TARGET=builder-prebuilt` automatically when the matrix entry has a non-empty `builder-base-image`; otherwise it defaults to `builder-fromsource`.
|
||
|
||
The matrix `(build-type, platforms)` → `builder-base-image` mapping for llama-cpp / ik-llama-cpp / turboquant entries:
|
||
|
||
| `build-type` | `platforms` | tag |
|
||
|---|---|---|
|
||
| `''` | `linux/amd64` | `base-grpc-amd64` |
|
||
| `''` | `linux/arm64` | `base-grpc-arm64` |
|
||
| `cublas` cuda 12 | `linux/amd64` | `base-grpc-cuda-12-amd64` |
|
||
| `cublas` cuda 13 | `linux/amd64` | `base-grpc-cuda-13-amd64` |
|
||
| `cublas` cuda 13 | `linux/arm64` | `base-grpc-cuda-13-arm64` |
|
||
| `cublas` cuda 12 + JetPack base | `linux/arm64` | `base-grpc-l4t-cuda-12-arm64` |
|
||
| `hipblas` | `linux/amd64` | `base-grpc-rocm-amd64` |
|
||
| `vulkan` | `linux/amd64` | `base-grpc-vulkan-amd64` |
|
||
| `vulkan` | `linux/arm64` | `base-grpc-vulkan-arm64` |
|
||
| `sycl_*` | `linux/amd64` | `base-grpc-intel-amd64` |
|
||
|
||
### Bootstrap order when adding a new variant
|
||
|
||
If you add a new entry to `base-images.yml`'s matrix, the new tag does not exist on quay until the workflow runs. To consume it from a variant entry safely, dispatch the base-images workflow on the branch first:
|
||
|
||
```bash
|
||
gh workflow run base-images.yml --ref <feature-branch>
|
||
```
|
||
|
||
Wait for the new variant to push, then merge the consumer change. Otherwise the consumer's CI fails with "image not found."
|
||
|
||
## Per-arch native builds + manifest merge
|
||
|
||
Multi-arch backends (and the core LocalAI image) build natively per arch instead of running both arches under QEMU emulation on a single x86 runner. The pattern:
|
||
|
||
- The matrix has TWO entries per multi-arch backend, sharing the same `tag-suffix` but distinct `platforms` + `platform-tag` + `runs-on`. Example: `-cpu-faster-whisper` has one amd64 entry on `ubuntu-latest` and one arm64 entry on `ubuntu-24.04-arm`.
|
||
- Each per-arch build pushes by **canonical digest only** (no tags) via `outputs: type=image,push-by-digest=true,name-canonical=true,push=true`. The digest is uploaded as an artifact named `digests<tag-suffix>-<platform-tag>` (or `digests-localai<...>` for root-image builds).
|
||
- `scripts/changed-backends.js` detects shared `tag-suffix` and emits a `merge-matrix` output. `backend.yml` / `backend_pr.yml` have a `backend-merge-jobs` job that consumes it and calls `backend_merge.yml`.
|
||
- `backend_merge.yml` downloads all matching digest artifacts and runs `docker buildx imagetools create` to publish the final tagged manifest list pointing at both per-arch digests. Same `docker/metadata-action` config as the original monolithic build, so consumers see no tag-shape change.
|
||
- `image_merge.yml` is the equivalent for the root LocalAI image (`-core` placeholder when `tag-suffix` is empty so the artifact-name glob doesn't over-match across `core` and `gpu-vulkan`).
|
||
|
||
**`provenance: false` is required on multi-registry digest pushes**: with the default `mode=max` provenance attestation, BuildKit bundles a per-registry attestation manifest into each registry's manifest list, making the resulting list digest diverge across registries. `steps.build.outputs.digest` only matches one of them and the merge step's `imagetools create <reg>@sha256:<digest>` lookup fails on the other. Setting `provenance: false` keeps the digest content-only and identical across registries.
|
||
|
||
## Path filter on master push
|
||
|
||
Both `backend.yml` (push) and `backend_pr.yml` (PR) generate their matrix dynamically through `scripts/changed-backends.js`:
|
||
|
||
- **PR events**: paginated `pulls/{n}/files` API → filter the matrix to entries whose `dockerfile` path prefix matches the PR diff.
|
||
- **Push events**: GitHub Compare API (`/repos/{owner}/{repo}/compare/{before}...{after}`) → same path-filter logic. Falls back to "run everything" on first-branch push (`event.before` zero), API truncation (≥300 changed files), missing API token, or any thrown error.
|
||
- **Tag pushes**: `FORCE_ALL=true` is set from the workflow side (`startsWith(github.ref, 'refs/tags/')`) — releases rebuild every backend regardless of diff.
|
||
- **Schedule / `workflow_dispatch`**: no `event.before`, falls through to "run everything" automatically.
|
||
|
||
The Sunday 06:00 UTC cron on `backend.yml` exists specifically because path filtering can leave Python backends frozen on stale wheels. `DEPS_REFRESH` (below) only fires when the build actually runs, so an untouched Python backend would never re-resolve its unpinned deps. The weekly cron is the safety net.
|
||
|
||
## The `DEPS_REFRESH` cache-buster (Python backends)
|
||
|
||
Every Python backend goes through the shared `backend/Dockerfile.python`, which ends with:
|
||
|
||
```dockerfile
|
||
ARG DEPS_REFRESH=initial
|
||
RUN cd /${BACKEND} && PORTABLE_PYTHON=true make
|
||
```
|
||
|
||
Most Python backends ship `requirements*.txt` files that **do not pin every transitive dep** (`torch`, `transformers`, `vllm`, `diffusers`, etc. are listed without a `==` pin, or with `>=` lower bounds only). With a warm BuildKit cache, the `make` layer hashes only on Dockerfile instructions + COPYed source — not on what `pip install` resolves at runtime. So a warm cache would ship the *first* version of `vllm` ever cached and never pick up upstream releases.
|
||
|
||
`DEPS_REFRESH` defends against that:
|
||
|
||
- `backend_build.yml` computes `date -u +%Y-W%V` (ISO week, e.g. `2026-W19`) before each build and passes it as a build-arg.
|
||
- The `RUN ... make` layer's BuildKit hash now includes that string, so the layer invalidates **at most once per week**, automatically picking up newer wheels.
|
||
- Within a week, builds stay warm.
|
||
|
||
This applies only to `Dockerfile.python` because:
|
||
- Go (`Dockerfile.golang`) pins versions in `go.mod` / `go.sum`.
|
||
- Rust (`Dockerfile.rust`) pins via `Cargo.lock`.
|
||
- C++ backends pin gRPC (`v1.65.0`) and llama.cpp at a specific commit; their inputs don't drift between rebuilds.
|
||
|
||
### Adjusting the cadence
|
||
|
||
Bump the format to daily (`+%Y-%m-%d`) or hourly (`+%Y-%m-%d-%H`) for faster refreshes. For one-shot rebuilds without changing the schedule, append a marker to the tag-suffix in the matrix or temporarily delete that backend's cache tag in quay.
|
||
|
||
## ccache for C++ backend builds
|
||
|
||
`Dockerfile.{llama-cpp,ik-llama-cpp,turboquant}` declare a BuildKit cache mount on `/root/.ccache`:
|
||
|
||
```dockerfile
|
||
RUN --mount=type=cache,target=/root/.ccache,id=<backend>-ccache-${TARGETARCH}-${BUILD_TYPE},sharing=locked \
|
||
bash /usr/local/sbin/compile.sh
|
||
```
|
||
|
||
The compile script exports `CMAKE_C/CXX/CUDA_COMPILER_LAUNCHER=ccache` so CMake threads ccache through gcc/g++/nvcc. `cache-to: type=registry,mode=max` exports the cache mount data into the registry cache, so subsequent builds restore it.
|
||
|
||
On a `LLAMA_VERSION` bump, most translation units are byte-identical to the previous version's preprocessed source — ccache returns the previous `.o` and skips the real compile. Same for LocalAI source changes that don't actually touch llama.cpp's CMake inputs. Cache scope is per `(TARGETARCH, BUILD_TYPE)` so e.g. cublas-12 doesn't share with cublas-13 (their CUDA headers differ; cross-pollination would just be cache misses anyway).
|
||
|
||
## Composite actions
|
||
|
||
Two composite actions handle runner-side prep:
|
||
|
||
- **`.github/actions/free-disk-space/action.yml`** — wraps `jlumbroso/free-disk-space@main` plus an explicit apt purge of dotnet/android/ghc/mono/etc. Reclaims ~6–10 GB on `ubuntu-latest`. No-op on self-hosted runners. Used by `backend_build.yml`, `image_build.yml`, `test.yml`, `tests-aio.yml`, etc.
|
||
- **`.github/actions/setup-build-disk/action.yml`** — relocates Docker's data-root to `/mnt` on hosted X64 runners. GHA hosted `ubuntu-latest` ships ~75 GB of unused space at `/mnt`; combined with the free-disk-space cleanup this gives ~100 GB working space — enough for ROCm dev image + vLLM torch install + flash-attn intermediate layers. No-op on self-hosted and on non-X64 hosted runners. Used by `backend_build.yml`, `image_build.yml`, `base-images.yml`.
|
||
|
||
Both actions run before any docker buildx step.
|
||
|
||
## Concurrency
|
||
|
||
All `backend.yml` / `image.yml` / `test.yml` / etc. workflows use:
|
||
|
||
```yaml
|
||
concurrency:
|
||
group: ci-<workflow>-${{ github.event.pull_request.number || github.sha }}-${{ github.repository }}
|
||
cancel-in-progress: ${{ github.event_name == 'pull_request' }}
|
||
```
|
||
|
||
- **PR events** group by PR number → newer pushes to the same PR cancel old runs (intended).
|
||
- **Push events** group by `github.sha` → each master commit gets its own run; rapid-fire merges don't cancel each other (this was a real issue prior — two master pushes 11 seconds apart would cancel the first's CI).
|
||
|
||
## Self-warming, no separate populator
|
||
|
||
There is no cron job that pre-warms the BuildKit cache for individual backends. The production builds *are* the populators. The first master build of a given matrix entry pays the cold cost; subsequent same-entry master builds reuse everything that hasn't changed (apt installs, gRPC compile in the variant `builder-fromsource` stage or skipped entirely when consuming `base-grpc-*`, Python wheel installs, etc.). The base-images workflow's weekly cron is the closest thing to a populator and only refreshes the prebuilt builder bases.
|
||
|
||
## Manually evicting cache
|
||
|
||
To force a fully cold build for one backend or the whole image:
|
||
|
||
```bash
|
||
# Delete a single tag (requires quay credentials with admin on the repo)
|
||
curl -X DELETE \
|
||
-H "Authorization: Bearer ${QUAY_TOKEN}" \
|
||
https://quay.io/api/v1/repository/go-skynet/ci-cache/tag/cache-gpu-nvidia-cuda-12-vllm-amd64
|
||
|
||
# List all tags
|
||
curl -s -H "Authorization: Bearer ${QUAY_TOKEN}" \
|
||
"https://quay.io/api/v1/repository/go-skynet/ci-cache/tag/?limit=100" | jq '.tags[].name'
|
||
```
|
||
|
||
Eviction is rarely needed in normal operation — `DEPS_REFRESH` handles weekly drift, source changes invalidate naturally, and `mode=max` keeps the cache scoped per matrix entry per arch so a stale tag never bleeds into a different build.
|
||
|
||
## What the cache does **not** cover
|
||
|
||
- The `free-disk-space` and `setup-build-disk` composite actions run on every job — these reclaim runner-state, not Docker layers, so BuildKit caches don't apply.
|
||
- Intermediate artifacts of `Build (PR)` are not pushed anywhere — PRs only build for verification.
|
||
- Darwin builds (see below) — macOS runners have no Docker daemon, so the registry-backed BuildKit cache cannot apply.
|
||
|
||
## Darwin native caches
|
||
|
||
`backend_build_darwin.yml` runs natively on `macOS-14` GitHub-hosted runners — there is no Docker, no BuildKit, no cross-job registry cache. Instead, the reusable workflow uses `actions/cache@v4` for four native caches that mirror the spirit of the Linux cache (warm by default, weekly refresh for unpinned Python deps, PRs read-only).
|
||
|
||
| Cache | Path(s) | Key | Scope |
|
||
|---|---|---|---|
|
||
| Go modules + build | `~/go/pkg/mod`, `~/Library/Caches/go-build` | `go.sum` (managed by `actions/setup-go@v5` `cache: true`) | All darwin jobs |
|
||
| Homebrew | `~/Library/Caches/Homebrew/downloads`, selected `/opt/homebrew/Cellar/*` | hash of `backend_build_darwin.yml` | All darwin jobs |
|
||
| ccache (llama.cpp CMake) | `~/Library/Caches/ccache` | pinned `LLAMA_VERSION` from `backend/cpp/llama-cpp/Makefile` | `inputs.backend == 'llama-cpp'` only |
|
||
| Python wheels (uv + pip) | `~/Library/Caches/pip`, `~/Library/Caches/uv` | `inputs.backend` + ISO week (`+%Y-W%V`) + hash of that backend's `requirements*.txt` | `inputs.lang == 'python'` only |
|
||
|
||
Read/write semantics match the BuildKit cache: `actions/cache/restore` runs every time, `actions/cache/save` is gated on `github.event_name != 'pull_request'`. PRs read master's warm cache but never write back.
|
||
|
||
The Python wheel cache uses the same ISO-week cache-buster as the Linux `DEPS_REFRESH` build-arg — same problem (unpinned `torch`/`mlx`/`diffusers`/`transformers` resolve to fresh wheels weekly), same ~one-cold-rebuild-per-week solution.
|
||
|
||
The brew Cellar cache requires `HOMEBREW_NO_AUTO_UPDATE=1` and `HOMEBREW_NO_INSTALL_CLEANUP=1` (set as job-level env). Without those, `brew install` would mutate the very directories that were just restored, defeating the cache.
|
||
|
||
**Force-link after cache restore**: `actions/cache` restores `/opt/homebrew/Cellar/*` but NOT the `/opt/homebrew/bin/*` symlinks. After a cache hit, `brew install` sees the Cellar entries and decides "already installed" without re-running its link step, leaving the formulas off PATH. The Dependencies step explicitly runs `brew link --overwrite` for every cached formula afterwards to ensure the symlinks exist.
|
||
|
||
For ccache, the workflow exports `CMAKE_ARGS=… -DCMAKE_C_COMPILER_LAUNCHER=ccache -DCMAKE_CXX_COMPILER_LAUNCHER=ccache` via `$GITHUB_ENV` before running `make build-darwin-go-backend`. The Makefile in `backend/cpp/llama-cpp/` already forwards `CMAKE_ARGS` through to each variant build (`fallback`, `grpc`, `rpc-server`), so no script changes are needed. The three variants share most TUs, so ccache dedupes object files across them.
|
||
|
||
`backend_build_darwin.yml` also has a llama-cpp-specific build-step branch that runs `make backends/llama-cpp-darwin` (the bespoke script that compiles three CMake variants and bundles dylibs via `otool`), distinct from the generic `make build-darwin-${lang}-backend` path. This was consolidated from a previously-bespoke top-level `llama-cpp-darwin` job in `backend.yml` so llama-cpp on Darwin honors the same path filter as the other 34 Darwin backends.
|
||
|
||
### Cache budget on Darwin
|
||
|
||
GitHub Actions caches are limited to 10 GB per repo. Steady-state worst case: ~800 MB Go cache + ~2 GB brew Cellar + up to 2 GB ccache + ~1.5 GB × 5 python backends. If the cap is hit, prefer collapsing the per-backend Python keys into a shared `pyenv-darwin-shared-<week>` key (accepts more cross-backend churn for a smaller footprint) before reducing other caches.
|
||
|
||
## Self-hosted runners
|
||
|
||
`.github/backend-matrix.yml` has zero references to `arc-runner-set` or `bigger-runner` — all backends run on GHA free-tier hosted runners (`ubuntu-latest` for amd64, `ubuntu-24.04-arm` for arm64 native, `macos-14` for Darwin). The migration off self-hosted relied on the per-arch native split (no QEMU emulation) plus `setup-build-disk`'s `/mnt` relocation (~100 GB working space, enough for ROCm dev image + vLLM/torch installs).
|
||
|
||
One residual self-hosted reference remains in `test-extra.yml` (`tests-vibevoice-cpp-grpc-transcription` uses `bigger-runner` for the 30s JFK-decode timeout headroom). That's a separate concern.
|
||
|
||
## Touching the cache pipeline
|
||
|
||
When changing `image_build.yml`, `backend_build.yml`, any of the `backend/Dockerfile.*` files, `Dockerfile.base-grpc-builder`, `.docker/install-base-deps.sh`, `.docker/<backend>-compile.sh`, or `scripts/changed-backends.js`:
|
||
|
||
1. **Don't drop `DEPS_REFRESH=...` from the build-args** without a replacement strategy (lockfiles, pinned requirements). Otherwise master will silently freeze on whichever versions were cached at the time.
|
||
2. **Keep `(tag-suffix, platform-tag)` unique per matrix entry** — together they're the cache namespace. Two matrix entries sharing a key would clobber each other's cache.
|
||
3. **Keep `cache-to` gated on `github.event_name != 'pull_request'`** — PRs must not write.
|
||
4. **Keep `ignore-error=true` on `cache-to`** — quay registry hiccups must not fail builds.
|
||
5. **Keep `provenance: false` on push-by-digest steps** — multi-registry digest divergence is the Bug We Already Fixed; reintroducing provenance attestation re-breaks the merge.
|
||
6. **`install-base-deps.sh` is the single source of truth for base contents.** Both `Dockerfile.base-grpc-builder` (CI) and the variant Dockerfiles' `builder-fromsource` (local) bind-mount and execute it. If you add a package to one path, add it to the script — don't fork the logic into a Dockerfile RUN.
|
||
7. **After adding a `base-images.yml` matrix variant, run the workflow on your branch before merging consumer changes** that depend on the new tag — otherwise the consumer's CI fails "image not found."
|