2025-08-18 15:56:34 +00:00
|
|
|
import fs from "fs";
|
|
|
|
|
import yaml from "js-yaml";
|
|
|
|
|
import { Octokit } from "@octokit/core";
|
|
|
|
|
|
ci: phase 1-3 of GHA free tier migration (path filter, multi-arch split prep, /mnt disk relief) (#9726)
* ci: extract free-disk-space composite action
Consolidate the apt-clean + dotnet/android/ghc/boost removal blocks from
backend_build.yml, image_build.yml, and test.yml into a single composite
action. The three callers had slightly different inline blocks; the
composite uses the more aggressive backend_build/image_build variant for
all three callers — test.yml jobs now also purge snapd, edge/firefox/
powershell/r-base-core, and sweep /opt/ghc + /usr/local/share/boost +
$AGENT_TOOLSDIRECTORY. Idempotent and skipped on self-hosted runners.
In test.yml, actions/checkout now runs before the composite action call
because the composite lives at ./.github/actions/free-disk-space and
requires a checked-out repo. The original ordering relied on
jlumbroso/free-disk-space@main being a remote action; this is the
minimum-invasive change to support a local composite.
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <[email protected]>
* ci: path-filter backend.yml master push
Run scripts/changed-backends.js on master pushes too (not just PRs) so
unrelated commits don't rebuild all ~210 backend container images. Tag
pushes still build the full matrix via FORCE_ALL.
Push events use the GitHub Compare API to diff event.before..event.after.
Edge cases (first push with zero base, API truncation beyond 300 files,
missing fields, network failure) fall back to "run everything" — better
safe than silently miss a backend.
The matrix literal moves from .github/workflows/backend.yml into a new
data-only file at .github/backend-matrix.yml (outside workflows/ so
actionlint doesn't try to parse it as a workflow). Both backend.yml and
backend_pr.yml now consume the dynamic matrix output uniformly via
fromJson(needs.generate-matrix.outputs.matrix); the script reads the
matrix from the new location.
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <[email protected]>
* ci: bound max-parallel on backend-jobs matrices
Cap to 8 concurrent jobs to avoid queue starvation on the shared GHA free
pool while migration is in flight. Lift after Phases 4-5 retire the
self-hosted runners. Also drops a leftover commented-out max-parallel
line that lived in backend.yml since the previous matrix shape.
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <[email protected]>
* ci: scope backend cache per arch, push by digest
Prepare backend_build.yml for the multi-arch split. The reusable
workflow now accepts a `platform-tag` input ("amd64" / "arm64") that
scopes the registry cache to cache<suffix>-<platform-tag> and (on push
events) pushes the resulting image by canonical digest only. Digests
are uploaded as artifacts named digests<suffix>-<platform-tag> for the
merge job (Task 2.2) to consume.
`platform-tag` is optional with empty default during the migration —
existing callers continue to work unchanged (their cache key just
becomes `cache<suffix>-`, an orphaned but valid key). Tasks 2.3+ will
update callers to pass an explicit "amd64" / "arm64" value. Phase 6
flips the input to required: true once every caller is wired.
PR builds keep their existing tag-based push to ci-tests but pick up
the per-arch cache key. Multi-arch PR builds remain emulated in this
commit; they migrate when the matrix entries split (Tasks 2.3+).
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <[email protected]>
* ci: add backend_merge.yml reusable workflow
Joins per-arch digest artifacts (uploaded by backend_build.yml when
called with platform-tag) into a single tagged multi-arch manifest list
via `docker buildx imagetools create`. Called once per backend by
backend.yml after both per-arch build jobs succeed.
The workflow generates final tags identically to the previous monolithic
build job (same docker/metadata-action invocation), so consumers of
quay.io/go-skynet/local-ai-backends and localai/localai-backends see no
tag-shape change. Two imagetools calls (one per registry) reference the
same per-arch digests under different image names.
Not yet wired into backend.yml — Tasks 2.3+ rewrite individual matrix
entries to expand into per-arch + merge jobs that call this workflow.
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <[email protected]>
* ci: relocate Docker data-root to /mnt on hosted runners
GHA hosted ubuntu-latest runners ship a ~75 GB /mnt drive that's unused
by default. Stopping Docker, rsync'ing /var/lib/docker to /mnt, and
restarting with data-root pointing there yields ~100 GB of working
space (combined with the apt-clean from Task 1.1) — enough for ROCm
dev image + vLLM torch install + flash-attn intermediate layers.
This is the structural change that lets Phases 4 and 5 of the migration
plan move the bigger-runner and arc-runner-set jobs onto ubuntu-latest.
The composite action is no-op on self-hosted runners (where /mnt isn't
expected) and on non-X64 runners (Task 3.2 verifies the arm64 hosted
pool's /mnt shape separately before enabling). Wired into both
backend_build.yml and image_build.yml between free-disk-space and the
first Docker operation.
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <[email protected]>
* ci(setup-build-disk): chmod 1777 /mnt/docker-tmp
buildx CLI runs as the unprivileged 'runner' user and creates config
dirs under TMPDIR before binding them into the buildkit container.
/mnt is root-owned by default, so the original mkdir produced a
permission-denied when buildx tried to write there:
ERROR: mkdir /mnt/docker-tmp/buildkitd-config2740457204: permission denied
Mirror /tmp's permission mode (1777 — world-writable with sticky bit)
on /mnt/docker-tmp so non-root processes can stage their config.
Caught by the first PR run (image-build hipblas job) on PR #9726.
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <[email protected]>
* ci: weekly full-matrix rebuild via cron
Path-filtering backend.yml master push (the previous commit's main
optimization) skips backends whose source didn't change. That broke
the DEPS_REFRESH cache-buster's coverage: the build-arg keyed on
%Y-W%V busts the install layer's cache on a new ISO week, but only
when the build actually runs. Untouched Python backends (torch,
transformers, vllm with no version pin) would otherwise ship stale
wheels indefinitely.
Add a Sunday 06:00 UTC cron that fires the full matrix. Schedule
events have no event.ref / event.before, so the script's changedFiles
== null fallback (scripts/changed-backends.js) emits the full matrix
automatically — no script change needed.
C++/Go backends with pinned deps cache-hit and complete fast, so the
weekly cost is dominated by Python re-resolves which is exactly what
we want.
workflow_dispatch added so a maintainer can trigger an ad-hoc
full-matrix rebuild without faking a tag push.
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <[email protected]>
---------
Signed-off-by: Ettore Di Giacinto <[email protected]>
Co-authored-by: Ettore Di Giacinto <[email protected]>
2026-05-08 21:43:41 +00:00
|
|
|
// Matrix data lives in a small data-only YAML so both backend.yml (master push)
|
|
|
|
|
// and backend_pr.yml (pull_request) can use a dynamic `matrix: ${{ fromJson(...) }}`
|
|
|
|
|
// for the live job, while this script remains the single source of truth for
|
|
|
|
|
// "what backends does the project know about".
|
|
|
|
|
const matrixYml = yaml.load(fs.readFileSync(".github/backend-matrix.yml", "utf8"));
|
|
|
|
|
const includes = matrixYml.include;
|
|
|
|
|
const includesDarwin = matrixYml.includeDarwin;
|
2025-08-18 15:56:34 +00:00
|
|
|
|
|
|
|
|
const eventPath = process.env.GITHUB_EVENT_PATH;
|
|
|
|
|
const event = JSON.parse(fs.readFileSync(eventPath, "utf8"));
|
|
|
|
|
|
|
|
|
|
// Infer backend path
|
|
|
|
|
function inferBackendPath(item) {
|
|
|
|
|
if (item.dockerfile.endsWith("python")) {
|
2025-08-28 15:25:18 +00:00
|
|
|
return `backend/python/${item.backend}/`;
|
2025-08-18 15:56:34 +00:00
|
|
|
}
|
|
|
|
|
if (item.dockerfile.endsWith("golang")) {
|
2025-08-28 15:25:18 +00:00
|
|
|
return `backend/go/${item.backend}/`;
|
2025-08-18 15:56:34 +00:00
|
|
|
}
|
2026-04-08 17:23:16 +00:00
|
|
|
if (item.dockerfile.endsWith("rust")) {
|
|
|
|
|
return `backend/rust/${item.backend}/`;
|
|
|
|
|
}
|
2026-04-12 11:51:28 +00:00
|
|
|
if (item.dockerfile.endsWith("ik-llama-cpp")) {
|
|
|
|
|
return `backend/cpp/ik-llama-cpp/`;
|
|
|
|
|
}
|
feat(backend): add turboquant llama.cpp-fork backend (#9355)
* feat(backend): add turboquant llama.cpp-fork backend
turboquant is a llama.cpp fork (TheTom/llama-cpp-turboquant, branch
feature/turboquant-kv-cache) that adds a TurboQuant KV-cache scheme.
It ships as a first-class backend reusing backend/cpp/llama-cpp sources
via a thin wrapper Makefile: each variant target copies ../llama-cpp
into a sibling build dir and invokes llama-cpp's build-llama-cpp-grpc-server
with LLAMA_REPO/LLAMA_VERSION overridden to point at the fork. No
duplication of grpc-server.cpp — upstream fixes flow through automatically.
Wires up the full matrix (CPU, CUDA 12/13, L4T, L4T-CUDA13, ROCm, SYCL
f32/f16, Vulkan) in backend.yml and the gallery entries in index.yaml,
adds a tests-turboquant-grpc e2e job driven by BACKEND_TEST_CACHE_TYPE_K/V=q8_0
to exercise the KV-cache config path (backend_test.go gains dedicated env
vars wired into ModelOptions.CacheTypeKey/Value — a generic improvement
usable by any llama.cpp-family backend), and registers a nightly auto-bump
PR in bump_deps.yaml tracking feature/turboquant-kv-cache.
scripts/changed-backends.js gets a special-case so edits to
backend/cpp/llama-cpp/ also retrigger the turboquant CI pipeline, since
the wrapper reuses those sources.
* feat(turboquant): carry upstream patches against fork API drift
turboquant branched from llama.cpp before upstream commit 66060008
("server: respect the ignore eos flag", #21203) which added the
`logit_bias_eog` field to `server_context_meta` and a matching
parameter to `server_task::params_from_json_cmpl`. The shared
backend/cpp/llama-cpp/grpc-server.cpp depends on that field, so
building it against the fork unmodified fails.
Cherry-pick that commit as a patch file under
backend/cpp/turboquant/patches/ and apply it to the cloned fork
sources via a new apply-patches.sh hook called from the wrapper
Makefile. Simplifies the build flow too: instead of hopping through
llama-cpp's build-llama-cpp-grpc-server indirection, the wrapper now
drives the copied Makefile directly (clone -> patch -> build).
Drop the corresponding patch whenever the fork catches up with
upstream — the build fails fast if a patch stops applying, which
is the signal to retire it.
* docs: add turboquant backend section + clarify cache_type_k/v
Document the new turboquant (llama.cpp fork with TurboQuant KV-cache)
backend alongside the existing llama-cpp / ik-llama-cpp sections in
features/text-generation.md: when to pick it, how to install it from
the gallery, and a YAML example showing backend: turboquant together
with cache_type_k / cache_type_v.
Also expand the cache_type_k / cache_type_v table rows in
advanced/model-configuration.md to spell out the accepted llama.cpp
quantization values and note that these fields apply to all
llama.cpp-family backends, not just vLLM.
* feat(turboquant): patch ggml-rpc GGML_OP_COUNT assertion
The fork adds new GGML ops bringing GGML_OP_COUNT to 97, but
ggml/include/ggml-rpc.h static-asserts it equals 96, breaking
the GGML_RPC=ON build paths (turboquant-grpc / turboquant-rpc-server).
Carry a one-line patch that updates the expected count so the
assertion holds. Drop this patch whenever the fork fixes it upstream.
* feat(turboquant): allow turbo* KV-cache types and exercise them in e2e
The shared backend/cpp/llama-cpp/grpc-server.cpp carries its own
allow-list of accepted KV-cache types (kv_cache_types[]) and rejects
anything outside it before the value reaches llama.cpp's parser. That
list only contains the standard llama.cpp types — turbo2/turbo3/turbo4
would throw "Unsupported cache type" at LoadModel time, meaning
nothing the LocalAI gRPC layer accepted was actually fork-specific.
Add a build-time augmentation step (patch-grpc-server.sh, called from
the turboquant wrapper Makefile) that inserts GGML_TYPE_TURBO2_0/3_0/4_0
into the allow-list of the *copied* grpc-server.cpp under
turboquant-<flavor>-build/. The original file under backend/cpp/llama-cpp/
is never touched, so the stock llama-cpp build keeps compiling against
vanilla upstream which has no notion of those enum values.
Switch test-extra-backend-turboquant to set
BACKEND_TEST_CACHE_TYPE_K=turbo3 / _V=turbo3 so the e2e gRPC suite
actually runs the fork's TurboQuant KV-cache code paths (turbo3 also
auto-enables flash_attention in the fork). Picking q8_0 here would
only re-test the standard llama.cpp path that the upstream llama-cpp
backend already covers.
Refresh the docs (text-generation.md + model-configuration.md) to
list turbo2/turbo3/turbo4 explicitly and call out that you only get
the TurboQuant code path with this backend + a turbo* cache type.
* fix(turboquant): rewrite patch-grpc-server.sh in awk, not python3
The builder image (ubuntu:24.04 stage-2 in Dockerfile.turboquant)
does not install python3, so the python-based augmentation step
errored with `python3: command not found` at make time. Switch to
awk, which ships in coreutils and is already available everywhere
the rest of the wrapper Makefile runs.
* Apply suggestion from @mudler
Signed-off-by: Ettore Di Giacinto <[email protected]>
---------
Signed-off-by: Ettore Di Giacinto <[email protected]>
2026-04-14 23:25:04 +00:00
|
|
|
if (item.dockerfile.endsWith("turboquant")) {
|
|
|
|
|
// turboquant is a llama.cpp fork that reuses backend/cpp/llama-cpp sources
|
|
|
|
|
// via a thin wrapper Makefile. Changes to either dir should retrigger it.
|
|
|
|
|
return `backend/cpp/turboquant/`;
|
|
|
|
|
}
|
ci(bump-deps): register ds4 + move version pin into the Makefile (#9761)
* ci(bump-deps): register ds4 + move version pin into the Makefile
The initial ds4 PR (#9758) put the upstream commit pin in
backend/cpp/ds4/prepare.sh as a shell variable. The auto-bump bot at
.github/bump_deps.sh greps for ^$VAR?= in a Makefile, so DS4_VERSION
was invisible to it - other backends (llama-cpp, ik-llama-cpp,
turboquant, voxtral, etc.) all pin in their Makefile.
This change:
- Moves DS4_VERSION?= and DS4_REPO?= to the top of
backend/cpp/ds4/Makefile.
- Inlines the git init/fetch/checkout recipe into the 'ds4:' target
(matches llama-cpp's 'llama.cpp:' target pattern). Directory acts
as the target so make only re-clones when missing.
- Deletes the now-redundant prepare.sh.
- Adds antirez/ds4 + DS4_VERSION + main + backend/cpp/ds4/Makefile to
the .github/workflows/bump_deps.yaml matrix so the daily bot opens
PRs against this pin.
- Updates .agents/ds4-backend.md to point at the Makefile.
Verified:
$ grep -m1 '^DS4_VERSION?=' backend/cpp/ds4/Makefile
DS4_VERSION?=ae302c2fa18cc6d9aefc021d0f27ae03c9ad2fc0
$ make -C backend/cpp/ds4 ds4 # clones into ds4/ at the pin
$ make -C backend/cpp/ds4 ds4 # no-op on second invocation
make: 'ds4' is up to date.
Signed-off-by: Ettore Di Giacinto <[email protected]>
* ci: route backend/cpp/ds4/ changes through changed-backends.js
scripts/changed-backends.js:inferBackendPath has an explicit branch per
cpp dockerfile suffix (ik-llama-cpp, turboquant, llama-cpp). Without a
matching branch the function returns null, the backend never lands in
the path map, and PR change-detection cannot map "backend/cpp/ds4/X
changed" -> "rebuild ds4 image".
This is why PR #9761 produced zero ds4 jobs even though it directly
edits backend/cpp/ds4/Makefile.
Adds the missing branch (Dockerfile.ds4 -> backend/cpp/ds4/), placed
before the llama-cpp branch (since both share the .cpp ancestry but
ds4 is more specific - same ordering rule documented in
.agents/adding-backends.md).
Verified with a local Node simulation of the script against this PR's
diff: the path map now contains 'ds4 -> backend/cpp/ds4/' and a
'backend/cpp/ds4/Makefile' change correctly triggers the ds4 backend
in the rebuild set.
Signed-off-by: Ettore Di Giacinto <[email protected]>
* docs(adding-backends): harden the two gotchas that bit ds4
Both omissions are silent at the time you ADD a backend - the failure
mode only appears later (the bump bot stays silent forever, or the path
filter shows up on the next PR that touches your backend with zero CI
jobs and looks broken for unrelated reasons). Expanding the
`scripts/changed-backends.js` paragraph from a one-liner to a fully
worked example, and adding a new sibling paragraph for the
`bump_deps.yaml` + Makefile-pin contract.
Both call out the specific mistakes from the ds4 timeline (#9758
→ #9761) so future contributors can pattern-match on the cause.
Signed-off-by: Ettore Di Giacinto <[email protected]>
---------
Signed-off-by: Ettore Di Giacinto <[email protected]>
Co-authored-by: Ettore Di Giacinto <[email protected]>
2026-05-11 20:46:02 +00:00
|
|
|
if (item.dockerfile.endsWith("ds4")) {
|
|
|
|
|
return `backend/cpp/ds4/`;
|
|
|
|
|
}
|
2025-08-18 15:56:34 +00:00
|
|
|
if (item.dockerfile.endsWith("llama-cpp")) {
|
2025-08-28 15:25:18 +00:00
|
|
|
return `backend/cpp/llama-cpp/`;
|
2025-08-18 15:56:34 +00:00
|
|
|
}
|
|
|
|
|
return null;
|
|
|
|
|
}
|
|
|
|
|
|
2025-09-01 20:18:30 +00:00
|
|
|
function inferBackendPathDarwin(item) {
|
2026-05-09 08:18:17 +00:00
|
|
|
// llama-cpp on Darwin builds from the C++ sources, not a backend/go/llama-cpp
|
|
|
|
|
// tree (which doesn't exist). The Darwin job is matrix-driven with lang=go
|
|
|
|
|
// for runner/toolchain selection, but the source path is C++.
|
|
|
|
|
if (item.backend === "llama-cpp") {
|
|
|
|
|
return `backend/cpp/llama-cpp/`;
|
|
|
|
|
}
|
2025-09-01 20:18:30 +00:00
|
|
|
if (!item.lang) {
|
|
|
|
|
return `backend/python/${item.backend}/`;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
return `backend/${item.lang}/${item.backend}/`;
|
|
|
|
|
}
|
|
|
|
|
|
2026-03-30 17:46:07 +00:00
|
|
|
// Build a deduplicated map of backend name -> path prefix from all matrix entries
|
|
|
|
|
function getAllBackendPaths() {
|
|
|
|
|
const paths = new Map();
|
|
|
|
|
for (const item of includes) {
|
|
|
|
|
const p = inferBackendPath(item);
|
|
|
|
|
if (p && !paths.has(item.backend)) {
|
|
|
|
|
paths.set(item.backend, p);
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
for (const item of includesDarwin) {
|
|
|
|
|
const p = inferBackendPathDarwin(item);
|
|
|
|
|
if (p && !paths.has(item.backend)) {
|
|
|
|
|
paths.set(item.backend, p);
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
return paths;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
const allBackendPaths = getAllBackendPaths();
|
|
|
|
|
|
|
|
|
|
const token = process.env.GITHUB_TOKEN;
|
|
|
|
|
const octokit = new Octokit({ auth: token });
|
|
|
|
|
|
ci: phase 1-3 of GHA free tier migration (path filter, multi-arch split prep, /mnt disk relief) (#9726)
* ci: extract free-disk-space composite action
Consolidate the apt-clean + dotnet/android/ghc/boost removal blocks from
backend_build.yml, image_build.yml, and test.yml into a single composite
action. The three callers had slightly different inline blocks; the
composite uses the more aggressive backend_build/image_build variant for
all three callers — test.yml jobs now also purge snapd, edge/firefox/
powershell/r-base-core, and sweep /opt/ghc + /usr/local/share/boost +
$AGENT_TOOLSDIRECTORY. Idempotent and skipped on self-hosted runners.
In test.yml, actions/checkout now runs before the composite action call
because the composite lives at ./.github/actions/free-disk-space and
requires a checked-out repo. The original ordering relied on
jlumbroso/free-disk-space@main being a remote action; this is the
minimum-invasive change to support a local composite.
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <[email protected]>
* ci: path-filter backend.yml master push
Run scripts/changed-backends.js on master pushes too (not just PRs) so
unrelated commits don't rebuild all ~210 backend container images. Tag
pushes still build the full matrix via FORCE_ALL.
Push events use the GitHub Compare API to diff event.before..event.after.
Edge cases (first push with zero base, API truncation beyond 300 files,
missing fields, network failure) fall back to "run everything" — better
safe than silently miss a backend.
The matrix literal moves from .github/workflows/backend.yml into a new
data-only file at .github/backend-matrix.yml (outside workflows/ so
actionlint doesn't try to parse it as a workflow). Both backend.yml and
backend_pr.yml now consume the dynamic matrix output uniformly via
fromJson(needs.generate-matrix.outputs.matrix); the script reads the
matrix from the new location.
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <[email protected]>
* ci: bound max-parallel on backend-jobs matrices
Cap to 8 concurrent jobs to avoid queue starvation on the shared GHA free
pool while migration is in flight. Lift after Phases 4-5 retire the
self-hosted runners. Also drops a leftover commented-out max-parallel
line that lived in backend.yml since the previous matrix shape.
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <[email protected]>
* ci: scope backend cache per arch, push by digest
Prepare backend_build.yml for the multi-arch split. The reusable
workflow now accepts a `platform-tag` input ("amd64" / "arm64") that
scopes the registry cache to cache<suffix>-<platform-tag> and (on push
events) pushes the resulting image by canonical digest only. Digests
are uploaded as artifacts named digests<suffix>-<platform-tag> for the
merge job (Task 2.2) to consume.
`platform-tag` is optional with empty default during the migration —
existing callers continue to work unchanged (their cache key just
becomes `cache<suffix>-`, an orphaned but valid key). Tasks 2.3+ will
update callers to pass an explicit "amd64" / "arm64" value. Phase 6
flips the input to required: true once every caller is wired.
PR builds keep their existing tag-based push to ci-tests but pick up
the per-arch cache key. Multi-arch PR builds remain emulated in this
commit; they migrate when the matrix entries split (Tasks 2.3+).
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <[email protected]>
* ci: add backend_merge.yml reusable workflow
Joins per-arch digest artifacts (uploaded by backend_build.yml when
called with platform-tag) into a single tagged multi-arch manifest list
via `docker buildx imagetools create`. Called once per backend by
backend.yml after both per-arch build jobs succeed.
The workflow generates final tags identically to the previous monolithic
build job (same docker/metadata-action invocation), so consumers of
quay.io/go-skynet/local-ai-backends and localai/localai-backends see no
tag-shape change. Two imagetools calls (one per registry) reference the
same per-arch digests under different image names.
Not yet wired into backend.yml — Tasks 2.3+ rewrite individual matrix
entries to expand into per-arch + merge jobs that call this workflow.
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <[email protected]>
* ci: relocate Docker data-root to /mnt on hosted runners
GHA hosted ubuntu-latest runners ship a ~75 GB /mnt drive that's unused
by default. Stopping Docker, rsync'ing /var/lib/docker to /mnt, and
restarting with data-root pointing there yields ~100 GB of working
space (combined with the apt-clean from Task 1.1) — enough for ROCm
dev image + vLLM torch install + flash-attn intermediate layers.
This is the structural change that lets Phases 4 and 5 of the migration
plan move the bigger-runner and arc-runner-set jobs onto ubuntu-latest.
The composite action is no-op on self-hosted runners (where /mnt isn't
expected) and on non-X64 runners (Task 3.2 verifies the arm64 hosted
pool's /mnt shape separately before enabling). Wired into both
backend_build.yml and image_build.yml between free-disk-space and the
first Docker operation.
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <[email protected]>
* ci(setup-build-disk): chmod 1777 /mnt/docker-tmp
buildx CLI runs as the unprivileged 'runner' user and creates config
dirs under TMPDIR before binding them into the buildkit container.
/mnt is root-owned by default, so the original mkdir produced a
permission-denied when buildx tried to write there:
ERROR: mkdir /mnt/docker-tmp/buildkitd-config2740457204: permission denied
Mirror /tmp's permission mode (1777 — world-writable with sticky bit)
on /mnt/docker-tmp so non-root processes can stage their config.
Caught by the first PR run (image-build hipblas job) on PR #9726.
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <[email protected]>
* ci: weekly full-matrix rebuild via cron
Path-filtering backend.yml master push (the previous commit's main
optimization) skips backends whose source didn't change. That broke
the DEPS_REFRESH cache-buster's coverage: the build-arg keyed on
%Y-W%V busts the install layer's cache on a new ISO week, but only
when the build actually runs. Untouched Python backends (torch,
transformers, vllm with no version pin) would otherwise ship stale
wheels indefinitely.
Add a Sunday 06:00 UTC cron that fires the full matrix. Schedule
events have no event.ref / event.before, so the script's changedFiles
== null fallback (scripts/changed-backends.js) emits the full matrix
automatically — no script change needed.
C++/Go backends with pinned deps cache-hit and complete fast, so the
weekly cost is dominated by Python re-resolves which is exactly what
we want.
workflow_dispatch added so a maintainer can trigger an ad-hoc
full-matrix rebuild without faking a tag push.
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <[email protected]>
---------
Signed-off-by: Ettore Di Giacinto <[email protected]>
Co-authored-by: Ettore Di Giacinto <[email protected]>
2026-05-08 21:43:41 +00:00
|
|
|
// PR file list — paginated.
|
|
|
|
|
async function getChangedFilesForPR(event) {
|
|
|
|
|
const prNumber = event.pull_request.number;
|
|
|
|
|
const repo = event.repository.name;
|
|
|
|
|
const owner = event.repository.owner.login;
|
2026-03-30 17:46:07 +00:00
|
|
|
let files = [];
|
|
|
|
|
let page = 1;
|
|
|
|
|
while (true) {
|
|
|
|
|
const res = await octokit.request('GET /repos/{owner}/{repo}/pulls/{pull_number}/files', {
|
|
|
|
|
owner,
|
|
|
|
|
repo,
|
|
|
|
|
pull_number: prNumber,
|
|
|
|
|
per_page: 100,
|
|
|
|
|
page
|
|
|
|
|
});
|
|
|
|
|
files = files.concat(res.data.map(f => f.filename));
|
|
|
|
|
if (res.data.length < 100) break;
|
|
|
|
|
page++;
|
|
|
|
|
}
|
|
|
|
|
return files;
|
|
|
|
|
}
|
|
|
|
|
|
ci: phase 1-3 of GHA free tier migration (path filter, multi-arch split prep, /mnt disk relief) (#9726)
* ci: extract free-disk-space composite action
Consolidate the apt-clean + dotnet/android/ghc/boost removal blocks from
backend_build.yml, image_build.yml, and test.yml into a single composite
action. The three callers had slightly different inline blocks; the
composite uses the more aggressive backend_build/image_build variant for
all three callers — test.yml jobs now also purge snapd, edge/firefox/
powershell/r-base-core, and sweep /opt/ghc + /usr/local/share/boost +
$AGENT_TOOLSDIRECTORY. Idempotent and skipped on self-hosted runners.
In test.yml, actions/checkout now runs before the composite action call
because the composite lives at ./.github/actions/free-disk-space and
requires a checked-out repo. The original ordering relied on
jlumbroso/free-disk-space@main being a remote action; this is the
minimum-invasive change to support a local composite.
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <[email protected]>
* ci: path-filter backend.yml master push
Run scripts/changed-backends.js on master pushes too (not just PRs) so
unrelated commits don't rebuild all ~210 backend container images. Tag
pushes still build the full matrix via FORCE_ALL.
Push events use the GitHub Compare API to diff event.before..event.after.
Edge cases (first push with zero base, API truncation beyond 300 files,
missing fields, network failure) fall back to "run everything" — better
safe than silently miss a backend.
The matrix literal moves from .github/workflows/backend.yml into a new
data-only file at .github/backend-matrix.yml (outside workflows/ so
actionlint doesn't try to parse it as a workflow). Both backend.yml and
backend_pr.yml now consume the dynamic matrix output uniformly via
fromJson(needs.generate-matrix.outputs.matrix); the script reads the
matrix from the new location.
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <[email protected]>
* ci: bound max-parallel on backend-jobs matrices
Cap to 8 concurrent jobs to avoid queue starvation on the shared GHA free
pool while migration is in flight. Lift after Phases 4-5 retire the
self-hosted runners. Also drops a leftover commented-out max-parallel
line that lived in backend.yml since the previous matrix shape.
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <[email protected]>
* ci: scope backend cache per arch, push by digest
Prepare backend_build.yml for the multi-arch split. The reusable
workflow now accepts a `platform-tag` input ("amd64" / "arm64") that
scopes the registry cache to cache<suffix>-<platform-tag> and (on push
events) pushes the resulting image by canonical digest only. Digests
are uploaded as artifacts named digests<suffix>-<platform-tag> for the
merge job (Task 2.2) to consume.
`platform-tag` is optional with empty default during the migration —
existing callers continue to work unchanged (their cache key just
becomes `cache<suffix>-`, an orphaned but valid key). Tasks 2.3+ will
update callers to pass an explicit "amd64" / "arm64" value. Phase 6
flips the input to required: true once every caller is wired.
PR builds keep their existing tag-based push to ci-tests but pick up
the per-arch cache key. Multi-arch PR builds remain emulated in this
commit; they migrate when the matrix entries split (Tasks 2.3+).
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <[email protected]>
* ci: add backend_merge.yml reusable workflow
Joins per-arch digest artifacts (uploaded by backend_build.yml when
called with platform-tag) into a single tagged multi-arch manifest list
via `docker buildx imagetools create`. Called once per backend by
backend.yml after both per-arch build jobs succeed.
The workflow generates final tags identically to the previous monolithic
build job (same docker/metadata-action invocation), so consumers of
quay.io/go-skynet/local-ai-backends and localai/localai-backends see no
tag-shape change. Two imagetools calls (one per registry) reference the
same per-arch digests under different image names.
Not yet wired into backend.yml — Tasks 2.3+ rewrite individual matrix
entries to expand into per-arch + merge jobs that call this workflow.
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <[email protected]>
* ci: relocate Docker data-root to /mnt on hosted runners
GHA hosted ubuntu-latest runners ship a ~75 GB /mnt drive that's unused
by default. Stopping Docker, rsync'ing /var/lib/docker to /mnt, and
restarting with data-root pointing there yields ~100 GB of working
space (combined with the apt-clean from Task 1.1) — enough for ROCm
dev image + vLLM torch install + flash-attn intermediate layers.
This is the structural change that lets Phases 4 and 5 of the migration
plan move the bigger-runner and arc-runner-set jobs onto ubuntu-latest.
The composite action is no-op on self-hosted runners (where /mnt isn't
expected) and on non-X64 runners (Task 3.2 verifies the arm64 hosted
pool's /mnt shape separately before enabling). Wired into both
backend_build.yml and image_build.yml between free-disk-space and the
first Docker operation.
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <[email protected]>
* ci(setup-build-disk): chmod 1777 /mnt/docker-tmp
buildx CLI runs as the unprivileged 'runner' user and creates config
dirs under TMPDIR before binding them into the buildkit container.
/mnt is root-owned by default, so the original mkdir produced a
permission-denied when buildx tried to write there:
ERROR: mkdir /mnt/docker-tmp/buildkitd-config2740457204: permission denied
Mirror /tmp's permission mode (1777 — world-writable with sticky bit)
on /mnt/docker-tmp so non-root processes can stage their config.
Caught by the first PR run (image-build hipblas job) on PR #9726.
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <[email protected]>
* ci: weekly full-matrix rebuild via cron
Path-filtering backend.yml master push (the previous commit's main
optimization) skips backends whose source didn't change. That broke
the DEPS_REFRESH cache-buster's coverage: the build-arg keyed on
%Y-W%V busts the install layer's cache on a new ISO week, but only
when the build actually runs. Untouched Python backends (torch,
transformers, vllm with no version pin) would otherwise ship stale
wheels indefinitely.
Add a Sunday 06:00 UTC cron that fires the full matrix. Schedule
events have no event.ref / event.before, so the script's changedFiles
== null fallback (scripts/changed-backends.js) emits the full matrix
automatically — no script change needed.
C++/Go backends with pinned deps cache-hit and complete fast, so the
weekly cost is dominated by Python re-resolves which is exactly what
we want.
workflow_dispatch added so a maintainer can trigger an ad-hoc
full-matrix rebuild without faking a tag push.
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <[email protected]>
---------
Signed-off-by: Ettore Di Giacinto <[email protected]>
Co-authored-by: Ettore Di Giacinto <[email protected]>
2026-05-08 21:43:41 +00:00
|
|
|
// Branch-push file list — uses the Compare API so it works in shallow clones.
|
|
|
|
|
// Returns null to signal "we cannot compute a reliable diff; run everything".
|
|
|
|
|
async function getChangedFilesForPush(event) {
|
|
|
|
|
const before = event.before;
|
|
|
|
|
const after = event.after;
|
|
|
|
|
// First push to a branch carries an all-zero `before` SHA and there's no
|
|
|
|
|
// base to diff against. Run everything in that case.
|
|
|
|
|
if (!before || !after || /^0+$/.test(before)) return null;
|
|
|
|
|
const owner = event.repository.owner.login;
|
|
|
|
|
const repo = event.repository.name;
|
|
|
|
|
let res;
|
|
|
|
|
try {
|
|
|
|
|
res = await octokit.request('GET /repos/{owner}/{repo}/compare/{basehead}', {
|
|
|
|
|
owner,
|
|
|
|
|
repo,
|
|
|
|
|
basehead: `${before}...${after}`,
|
|
|
|
|
});
|
|
|
|
|
} catch (err) {
|
|
|
|
|
console.log("compare API failed, falling back to run-all:", err.message);
|
|
|
|
|
return null;
|
|
|
|
|
}
|
|
|
|
|
if (!res.data || !Array.isArray(res.data.files)) return null;
|
|
|
|
|
// The compare endpoint caps the file list at 300. If we hit the cap we may
|
|
|
|
|
// be missing changes — be conservative and run everything.
|
|
|
|
|
if (res.data.files.length >= 300) {
|
|
|
|
|
console.log("compare API returned 300+ files (truncated), falling back to run-all");
|
|
|
|
|
return null;
|
|
|
|
|
}
|
|
|
|
|
return res.data.files.map(f => f.filename);
|
|
|
|
|
}
|
2025-08-18 15:56:34 +00:00
|
|
|
|
ci: tag every backend digest, including singletons
backend_build.yml pushes by canonical digest only (push-by-digest=true,
no tags applied at build time). User-facing tagging happens in
backend_merge.yml's `imagetools create` step. Before this commit,
scripts/changed-backends.js emitted a merge entry only for tag-suffixes
with 2+ legs, so every single-arch backend (CUDA/ROCm/Intel Python
images, vLLM, sglang, transformers, diffusers, ...) pushed its digest
untagged and stayed that way until quay's GC reaped it. Symptom: tag
releases shipped multi-arch backends tagged correctly, but no
v<X>-gpu-nvidia-cuda-12-vllm (or any singleton variant) ever appeared
in the registry.
Changes:
- scripts/changed-backends.js drops the `group.length < 2` skip and
emits two merge matrices, one per arch class, so each downstream
merge job can `needs:` only its corresponding build matrix.
- backend.yml splits backend-merge-jobs into multiarch and singlearch
variants. The split preserves PR #9746's fix: slow singlearch CUDA
builds (~6h) must not gate multiarch merges, or quay's GC reaps the
multiarch per-arch digests before they're tagged.
- backend_pr.yml mirrors the split.
- backend_build.yml renames the digest artifact from
`digests<suffix>-<platform-tag>` to
`digests<suffix>--<platform-tag-or-"single">`. The `--` separator
prevents the merge-side glob from over-matching sibling backends
whose tag-suffix is a prefix of ours (e.g. -cpu-vllm vs
-cpu-vllm-omni, -cpu-mlx vs -cpu-mlx-audio); the `single` placeholder
keeps the name well-formed when platform-tag is empty.
- backend_merge.yml updates the download pattern to match.
Verified locally: a tag-push event now expands to 36 multiarch merge
entries (= 72 builds / 2 legs) and 199 singlearch merge entries (one
per singleton, including -gpu-nvidia-cuda-12-vllm at index 24).
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <[email protected]>
2026-05-11 13:22:00 +00:00
|
|
|
// Group matrix entries by tag-suffix and emit a merge-matrix entry per group.
|
|
|
|
|
// Both multi-leg groups (per-arch fan-out) and singletons get one entry each:
|
|
|
|
|
// the build job pushes by digest only with no tags applied, so every backend
|
|
|
|
|
// needs a downstream merge step to apply its tags via `imagetools create`,
|
|
|
|
|
// regardless of how many per-arch legs feed it. Callers split entries by
|
|
|
|
|
// arch class first (see splitByArch) and call this once per class so the
|
|
|
|
|
// resulting matrices can be wired to merge jobs that `needs:` only their
|
|
|
|
|
// corresponding build matrix — preventing slow single-arch builds from
|
|
|
|
|
// gating multi-arch merges (the bug fixed in PR #9746).
|
2026-05-08 22:04:42 +00:00
|
|
|
function computeMergeMatrix(entries) {
|
|
|
|
|
const groups = new Map();
|
|
|
|
|
for (const item of entries) {
|
|
|
|
|
if (!item['tag-suffix']) continue;
|
|
|
|
|
const key = item['tag-suffix'];
|
|
|
|
|
if (!groups.has(key)) groups.set(key, []);
|
|
|
|
|
groups.get(key).push(item);
|
|
|
|
|
}
|
|
|
|
|
const include = [];
|
|
|
|
|
for (const [tagSuffix, group] of groups) {
|
|
|
|
|
// tag-latest must agree across legs — they're going to publish under
|
|
|
|
|
// the same final tag, so disagreeing on whether it's also the :latest
|
|
|
|
|
// tag is an authoring bug. Warn loudly so a Task 2.5 fan-out typo is
|
|
|
|
|
// visible in CI logs instead of silently shipping the leg-0 value.
|
|
|
|
|
const first = group[0]['tag-latest'] || '';
|
|
|
|
|
for (const m of group) {
|
|
|
|
|
if ((m['tag-latest'] || '') !== first) {
|
|
|
|
|
console.warn(`tag-latest mismatch in group ${tagSuffix}: legs disagree (using ${first})`);
|
|
|
|
|
break;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
include.push({
|
|
|
|
|
'tag-suffix': tagSuffix,
|
|
|
|
|
'tag-latest': first,
|
|
|
|
|
});
|
|
|
|
|
}
|
|
|
|
|
return { include };
|
|
|
|
|
}
|
|
|
|
|
|
2026-05-10 16:15:53 +00:00
|
|
|
// Split a list of linux matrix entries into single-arch (no platform-tag) and
|
|
|
|
|
// multi-arch (platform-tag set, paired with a sibling entry sharing the same
|
|
|
|
|
// tag-suffix). The two are run as separate matrix jobs so backend-merge-jobs
|
|
|
|
|
// can `needs:` only the multi-arch one — slow single-arch builds (CUDA, ROCm,
|
|
|
|
|
// vLLM, etc.) don't block manifest assembly while their per-arch counterparts'
|
|
|
|
|
// untagged digests sit on quay long enough to be GC'd.
|
|
|
|
|
function splitByArch(entries) {
|
|
|
|
|
const multiarch = entries.filter(e => e['platform-tag']);
|
|
|
|
|
const singlearch = entries.filter(e => !e['platform-tag']);
|
|
|
|
|
return { multiarch, singlearch };
|
|
|
|
|
}
|
|
|
|
|
|
ci: phase 1-3 of GHA free tier migration (path filter, multi-arch split prep, /mnt disk relief) (#9726)
* ci: extract free-disk-space composite action
Consolidate the apt-clean + dotnet/android/ghc/boost removal blocks from
backend_build.yml, image_build.yml, and test.yml into a single composite
action. The three callers had slightly different inline blocks; the
composite uses the more aggressive backend_build/image_build variant for
all three callers — test.yml jobs now also purge snapd, edge/firefox/
powershell/r-base-core, and sweep /opt/ghc + /usr/local/share/boost +
$AGENT_TOOLSDIRECTORY. Idempotent and skipped on self-hosted runners.
In test.yml, actions/checkout now runs before the composite action call
because the composite lives at ./.github/actions/free-disk-space and
requires a checked-out repo. The original ordering relied on
jlumbroso/free-disk-space@main being a remote action; this is the
minimum-invasive change to support a local composite.
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <[email protected]>
* ci: path-filter backend.yml master push
Run scripts/changed-backends.js on master pushes too (not just PRs) so
unrelated commits don't rebuild all ~210 backend container images. Tag
pushes still build the full matrix via FORCE_ALL.
Push events use the GitHub Compare API to diff event.before..event.after.
Edge cases (first push with zero base, API truncation beyond 300 files,
missing fields, network failure) fall back to "run everything" — better
safe than silently miss a backend.
The matrix literal moves from .github/workflows/backend.yml into a new
data-only file at .github/backend-matrix.yml (outside workflows/ so
actionlint doesn't try to parse it as a workflow). Both backend.yml and
backend_pr.yml now consume the dynamic matrix output uniformly via
fromJson(needs.generate-matrix.outputs.matrix); the script reads the
matrix from the new location.
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <[email protected]>
* ci: bound max-parallel on backend-jobs matrices
Cap to 8 concurrent jobs to avoid queue starvation on the shared GHA free
pool while migration is in flight. Lift after Phases 4-5 retire the
self-hosted runners. Also drops a leftover commented-out max-parallel
line that lived in backend.yml since the previous matrix shape.
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <[email protected]>
* ci: scope backend cache per arch, push by digest
Prepare backend_build.yml for the multi-arch split. The reusable
workflow now accepts a `platform-tag` input ("amd64" / "arm64") that
scopes the registry cache to cache<suffix>-<platform-tag> and (on push
events) pushes the resulting image by canonical digest only. Digests
are uploaded as artifacts named digests<suffix>-<platform-tag> for the
merge job (Task 2.2) to consume.
`platform-tag` is optional with empty default during the migration —
existing callers continue to work unchanged (their cache key just
becomes `cache<suffix>-`, an orphaned but valid key). Tasks 2.3+ will
update callers to pass an explicit "amd64" / "arm64" value. Phase 6
flips the input to required: true once every caller is wired.
PR builds keep their existing tag-based push to ci-tests but pick up
the per-arch cache key. Multi-arch PR builds remain emulated in this
commit; they migrate when the matrix entries split (Tasks 2.3+).
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <[email protected]>
* ci: add backend_merge.yml reusable workflow
Joins per-arch digest artifacts (uploaded by backend_build.yml when
called with platform-tag) into a single tagged multi-arch manifest list
via `docker buildx imagetools create`. Called once per backend by
backend.yml after both per-arch build jobs succeed.
The workflow generates final tags identically to the previous monolithic
build job (same docker/metadata-action invocation), so consumers of
quay.io/go-skynet/local-ai-backends and localai/localai-backends see no
tag-shape change. Two imagetools calls (one per registry) reference the
same per-arch digests under different image names.
Not yet wired into backend.yml — Tasks 2.3+ rewrite individual matrix
entries to expand into per-arch + merge jobs that call this workflow.
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <[email protected]>
* ci: relocate Docker data-root to /mnt on hosted runners
GHA hosted ubuntu-latest runners ship a ~75 GB /mnt drive that's unused
by default. Stopping Docker, rsync'ing /var/lib/docker to /mnt, and
restarting with data-root pointing there yields ~100 GB of working
space (combined with the apt-clean from Task 1.1) — enough for ROCm
dev image + vLLM torch install + flash-attn intermediate layers.
This is the structural change that lets Phases 4 and 5 of the migration
plan move the bigger-runner and arc-runner-set jobs onto ubuntu-latest.
The composite action is no-op on self-hosted runners (where /mnt isn't
expected) and on non-X64 runners (Task 3.2 verifies the arm64 hosted
pool's /mnt shape separately before enabling). Wired into both
backend_build.yml and image_build.yml between free-disk-space and the
first Docker operation.
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <[email protected]>
* ci(setup-build-disk): chmod 1777 /mnt/docker-tmp
buildx CLI runs as the unprivileged 'runner' user and creates config
dirs under TMPDIR before binding them into the buildkit container.
/mnt is root-owned by default, so the original mkdir produced a
permission-denied when buildx tried to write there:
ERROR: mkdir /mnt/docker-tmp/buildkitd-config2740457204: permission denied
Mirror /tmp's permission mode (1777 — world-writable with sticky bit)
on /mnt/docker-tmp so non-root processes can stage their config.
Caught by the first PR run (image-build hipblas job) on PR #9726.
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <[email protected]>
* ci: weekly full-matrix rebuild via cron
Path-filtering backend.yml master push (the previous commit's main
optimization) skips backends whose source didn't change. That broke
the DEPS_REFRESH cache-buster's coverage: the build-arg keyed on
%Y-W%V busts the install layer's cache on a new ISO week, but only
when the build actually runs. Untouched Python backends (torch,
transformers, vllm with no version pin) would otherwise ship stale
wheels indefinitely.
Add a Sunday 06:00 UTC cron that fires the full matrix. Schedule
events have no event.ref / event.before, so the script's changedFiles
== null fallback (scripts/changed-backends.js) emits the full matrix
automatically — no script change needed.
C++/Go backends with pinned deps cache-hit and complete fast, so the
weekly cost is dominated by Python re-resolves which is exactly what
we want.
workflow_dispatch added so a maintainer can trigger an ad-hoc
full-matrix rebuild without faking a tag push.
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <[email protected]>
---------
Signed-off-by: Ettore Di Giacinto <[email protected]>
Co-authored-by: Ettore Di Giacinto <[email protected]>
2026-05-08 21:43:41 +00:00
|
|
|
function emitFullMatrix() {
|
2026-05-10 16:15:53 +00:00
|
|
|
const { multiarch, singlearch } = splitByArch(includes);
|
ci: tag every backend digest, including singletons
backend_build.yml pushes by canonical digest only (push-by-digest=true,
no tags applied at build time). User-facing tagging happens in
backend_merge.yml's `imagetools create` step. Before this commit,
scripts/changed-backends.js emitted a merge entry only for tag-suffixes
with 2+ legs, so every single-arch backend (CUDA/ROCm/Intel Python
images, vLLM, sglang, transformers, diffusers, ...) pushed its digest
untagged and stayed that way until quay's GC reaped it. Symptom: tag
releases shipped multi-arch backends tagged correctly, but no
v<X>-gpu-nvidia-cuda-12-vllm (or any singleton variant) ever appeared
in the registry.
Changes:
- scripts/changed-backends.js drops the `group.length < 2` skip and
emits two merge matrices, one per arch class, so each downstream
merge job can `needs:` only its corresponding build matrix.
- backend.yml splits backend-merge-jobs into multiarch and singlearch
variants. The split preserves PR #9746's fix: slow singlearch CUDA
builds (~6h) must not gate multiarch merges, or quay's GC reaps the
multiarch per-arch digests before they're tagged.
- backend_pr.yml mirrors the split.
- backend_build.yml renames the digest artifact from
`digests<suffix>-<platform-tag>` to
`digests<suffix>--<platform-tag-or-"single">`. The `--` separator
prevents the merge-side glob from over-matching sibling backends
whose tag-suffix is a prefix of ours (e.g. -cpu-vllm vs
-cpu-vllm-omni, -cpu-mlx vs -cpu-mlx-audio); the `single` placeholder
keeps the name well-formed when platform-tag is empty.
- backend_merge.yml updates the download pattern to match.
Verified locally: a tag-push event now expands to 36 multiarch merge
entries (= 72 builds / 2 legs) and 199 singlearch merge entries (one
per singleton, including -gpu-nvidia-cuda-12-vllm at index 24).
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <[email protected]>
2026-05-11 13:22:00 +00:00
|
|
|
const mergeMatrixMultiarch = computeMergeMatrix(multiarch);
|
|
|
|
|
const mergeMatrixSinglearch = computeMergeMatrix(singlearch);
|
|
|
|
|
const hasMergesMultiarch = mergeMatrixMultiarch.include.length > 0 ? 'true' : 'false';
|
|
|
|
|
const hasMergesSinglearch = mergeMatrixSinglearch.include.length > 0 ? 'true' : 'false';
|
ci: phase 1-3 of GHA free tier migration (path filter, multi-arch split prep, /mnt disk relief) (#9726)
* ci: extract free-disk-space composite action
Consolidate the apt-clean + dotnet/android/ghc/boost removal blocks from
backend_build.yml, image_build.yml, and test.yml into a single composite
action. The three callers had slightly different inline blocks; the
composite uses the more aggressive backend_build/image_build variant for
all three callers — test.yml jobs now also purge snapd, edge/firefox/
powershell/r-base-core, and sweep /opt/ghc + /usr/local/share/boost +
$AGENT_TOOLSDIRECTORY. Idempotent and skipped on self-hosted runners.
In test.yml, actions/checkout now runs before the composite action call
because the composite lives at ./.github/actions/free-disk-space and
requires a checked-out repo. The original ordering relied on
jlumbroso/free-disk-space@main being a remote action; this is the
minimum-invasive change to support a local composite.
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <[email protected]>
* ci: path-filter backend.yml master push
Run scripts/changed-backends.js on master pushes too (not just PRs) so
unrelated commits don't rebuild all ~210 backend container images. Tag
pushes still build the full matrix via FORCE_ALL.
Push events use the GitHub Compare API to diff event.before..event.after.
Edge cases (first push with zero base, API truncation beyond 300 files,
missing fields, network failure) fall back to "run everything" — better
safe than silently miss a backend.
The matrix literal moves from .github/workflows/backend.yml into a new
data-only file at .github/backend-matrix.yml (outside workflows/ so
actionlint doesn't try to parse it as a workflow). Both backend.yml and
backend_pr.yml now consume the dynamic matrix output uniformly via
fromJson(needs.generate-matrix.outputs.matrix); the script reads the
matrix from the new location.
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <[email protected]>
* ci: bound max-parallel on backend-jobs matrices
Cap to 8 concurrent jobs to avoid queue starvation on the shared GHA free
pool while migration is in flight. Lift after Phases 4-5 retire the
self-hosted runners. Also drops a leftover commented-out max-parallel
line that lived in backend.yml since the previous matrix shape.
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <[email protected]>
* ci: scope backend cache per arch, push by digest
Prepare backend_build.yml for the multi-arch split. The reusable
workflow now accepts a `platform-tag` input ("amd64" / "arm64") that
scopes the registry cache to cache<suffix>-<platform-tag> and (on push
events) pushes the resulting image by canonical digest only. Digests
are uploaded as artifacts named digests<suffix>-<platform-tag> for the
merge job (Task 2.2) to consume.
`platform-tag` is optional with empty default during the migration —
existing callers continue to work unchanged (their cache key just
becomes `cache<suffix>-`, an orphaned but valid key). Tasks 2.3+ will
update callers to pass an explicit "amd64" / "arm64" value. Phase 6
flips the input to required: true once every caller is wired.
PR builds keep their existing tag-based push to ci-tests but pick up
the per-arch cache key. Multi-arch PR builds remain emulated in this
commit; they migrate when the matrix entries split (Tasks 2.3+).
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <[email protected]>
* ci: add backend_merge.yml reusable workflow
Joins per-arch digest artifacts (uploaded by backend_build.yml when
called with platform-tag) into a single tagged multi-arch manifest list
via `docker buildx imagetools create`. Called once per backend by
backend.yml after both per-arch build jobs succeed.
The workflow generates final tags identically to the previous monolithic
build job (same docker/metadata-action invocation), so consumers of
quay.io/go-skynet/local-ai-backends and localai/localai-backends see no
tag-shape change. Two imagetools calls (one per registry) reference the
same per-arch digests under different image names.
Not yet wired into backend.yml — Tasks 2.3+ rewrite individual matrix
entries to expand into per-arch + merge jobs that call this workflow.
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <[email protected]>
* ci: relocate Docker data-root to /mnt on hosted runners
GHA hosted ubuntu-latest runners ship a ~75 GB /mnt drive that's unused
by default. Stopping Docker, rsync'ing /var/lib/docker to /mnt, and
restarting with data-root pointing there yields ~100 GB of working
space (combined with the apt-clean from Task 1.1) — enough for ROCm
dev image + vLLM torch install + flash-attn intermediate layers.
This is the structural change that lets Phases 4 and 5 of the migration
plan move the bigger-runner and arc-runner-set jobs onto ubuntu-latest.
The composite action is no-op on self-hosted runners (where /mnt isn't
expected) and on non-X64 runners (Task 3.2 verifies the arm64 hosted
pool's /mnt shape separately before enabling). Wired into both
backend_build.yml and image_build.yml between free-disk-space and the
first Docker operation.
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <[email protected]>
* ci(setup-build-disk): chmod 1777 /mnt/docker-tmp
buildx CLI runs as the unprivileged 'runner' user and creates config
dirs under TMPDIR before binding them into the buildkit container.
/mnt is root-owned by default, so the original mkdir produced a
permission-denied when buildx tried to write there:
ERROR: mkdir /mnt/docker-tmp/buildkitd-config2740457204: permission denied
Mirror /tmp's permission mode (1777 — world-writable with sticky bit)
on /mnt/docker-tmp so non-root processes can stage their config.
Caught by the first PR run (image-build hipblas job) on PR #9726.
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <[email protected]>
* ci: weekly full-matrix rebuild via cron
Path-filtering backend.yml master push (the previous commit's main
optimization) skips backends whose source didn't change. That broke
the DEPS_REFRESH cache-buster's coverage: the build-arg keyed on
%Y-W%V busts the install layer's cache on a new ISO week, but only
when the build actually runs. Untouched Python backends (torch,
transformers, vllm with no version pin) would otherwise ship stale
wheels indefinitely.
Add a Sunday 06:00 UTC cron that fires the full matrix. Schedule
events have no event.ref / event.before, so the script's changedFiles
== null fallback (scripts/changed-backends.js) emits the full matrix
automatically — no script change needed.
C++/Go backends with pinned deps cache-hit and complete fast, so the
weekly cost is dominated by Python re-resolves which is exactly what
we want.
workflow_dispatch added so a maintainer can trigger an ad-hoc
full-matrix rebuild without faking a tag push.
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <[email protected]>
---------
Signed-off-by: Ettore Di Giacinto <[email protected]>
Co-authored-by: Ettore Di Giacinto <[email protected]>
2026-05-08 21:43:41 +00:00
|
|
|
fs.appendFileSync(process.env.GITHUB_OUTPUT, `run-all=true\n`);
|
2026-05-10 16:15:53 +00:00
|
|
|
fs.appendFileSync(process.env.GITHUB_OUTPUT, `has-backends-singlearch=${singlearch.length > 0 ? 'true' : 'false'}\n`);
|
|
|
|
|
fs.appendFileSync(process.env.GITHUB_OUTPUT, `has-backends-multiarch=${multiarch.length > 0 ? 'true' : 'false'}\n`);
|
ci: phase 1-3 of GHA free tier migration (path filter, multi-arch split prep, /mnt disk relief) (#9726)
* ci: extract free-disk-space composite action
Consolidate the apt-clean + dotnet/android/ghc/boost removal blocks from
backend_build.yml, image_build.yml, and test.yml into a single composite
action. The three callers had slightly different inline blocks; the
composite uses the more aggressive backend_build/image_build variant for
all three callers — test.yml jobs now also purge snapd, edge/firefox/
powershell/r-base-core, and sweep /opt/ghc + /usr/local/share/boost +
$AGENT_TOOLSDIRECTORY. Idempotent and skipped on self-hosted runners.
In test.yml, actions/checkout now runs before the composite action call
because the composite lives at ./.github/actions/free-disk-space and
requires a checked-out repo. The original ordering relied on
jlumbroso/free-disk-space@main being a remote action; this is the
minimum-invasive change to support a local composite.
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <[email protected]>
* ci: path-filter backend.yml master push
Run scripts/changed-backends.js on master pushes too (not just PRs) so
unrelated commits don't rebuild all ~210 backend container images. Tag
pushes still build the full matrix via FORCE_ALL.
Push events use the GitHub Compare API to diff event.before..event.after.
Edge cases (first push with zero base, API truncation beyond 300 files,
missing fields, network failure) fall back to "run everything" — better
safe than silently miss a backend.
The matrix literal moves from .github/workflows/backend.yml into a new
data-only file at .github/backend-matrix.yml (outside workflows/ so
actionlint doesn't try to parse it as a workflow). Both backend.yml and
backend_pr.yml now consume the dynamic matrix output uniformly via
fromJson(needs.generate-matrix.outputs.matrix); the script reads the
matrix from the new location.
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <[email protected]>
* ci: bound max-parallel on backend-jobs matrices
Cap to 8 concurrent jobs to avoid queue starvation on the shared GHA free
pool while migration is in flight. Lift after Phases 4-5 retire the
self-hosted runners. Also drops a leftover commented-out max-parallel
line that lived in backend.yml since the previous matrix shape.
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <[email protected]>
* ci: scope backend cache per arch, push by digest
Prepare backend_build.yml for the multi-arch split. The reusable
workflow now accepts a `platform-tag` input ("amd64" / "arm64") that
scopes the registry cache to cache<suffix>-<platform-tag> and (on push
events) pushes the resulting image by canonical digest only. Digests
are uploaded as artifacts named digests<suffix>-<platform-tag> for the
merge job (Task 2.2) to consume.
`platform-tag` is optional with empty default during the migration —
existing callers continue to work unchanged (their cache key just
becomes `cache<suffix>-`, an orphaned but valid key). Tasks 2.3+ will
update callers to pass an explicit "amd64" / "arm64" value. Phase 6
flips the input to required: true once every caller is wired.
PR builds keep their existing tag-based push to ci-tests but pick up
the per-arch cache key. Multi-arch PR builds remain emulated in this
commit; they migrate when the matrix entries split (Tasks 2.3+).
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <[email protected]>
* ci: add backend_merge.yml reusable workflow
Joins per-arch digest artifacts (uploaded by backend_build.yml when
called with platform-tag) into a single tagged multi-arch manifest list
via `docker buildx imagetools create`. Called once per backend by
backend.yml after both per-arch build jobs succeed.
The workflow generates final tags identically to the previous monolithic
build job (same docker/metadata-action invocation), so consumers of
quay.io/go-skynet/local-ai-backends and localai/localai-backends see no
tag-shape change. Two imagetools calls (one per registry) reference the
same per-arch digests under different image names.
Not yet wired into backend.yml — Tasks 2.3+ rewrite individual matrix
entries to expand into per-arch + merge jobs that call this workflow.
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <[email protected]>
* ci: relocate Docker data-root to /mnt on hosted runners
GHA hosted ubuntu-latest runners ship a ~75 GB /mnt drive that's unused
by default. Stopping Docker, rsync'ing /var/lib/docker to /mnt, and
restarting with data-root pointing there yields ~100 GB of working
space (combined with the apt-clean from Task 1.1) — enough for ROCm
dev image + vLLM torch install + flash-attn intermediate layers.
This is the structural change that lets Phases 4 and 5 of the migration
plan move the bigger-runner and arc-runner-set jobs onto ubuntu-latest.
The composite action is no-op on self-hosted runners (where /mnt isn't
expected) and on non-X64 runners (Task 3.2 verifies the arm64 hosted
pool's /mnt shape separately before enabling). Wired into both
backend_build.yml and image_build.yml between free-disk-space and the
first Docker operation.
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <[email protected]>
* ci(setup-build-disk): chmod 1777 /mnt/docker-tmp
buildx CLI runs as the unprivileged 'runner' user and creates config
dirs under TMPDIR before binding them into the buildkit container.
/mnt is root-owned by default, so the original mkdir produced a
permission-denied when buildx tried to write there:
ERROR: mkdir /mnt/docker-tmp/buildkitd-config2740457204: permission denied
Mirror /tmp's permission mode (1777 — world-writable with sticky bit)
on /mnt/docker-tmp so non-root processes can stage their config.
Caught by the first PR run (image-build hipblas job) on PR #9726.
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <[email protected]>
* ci: weekly full-matrix rebuild via cron
Path-filtering backend.yml master push (the previous commit's main
optimization) skips backends whose source didn't change. That broke
the DEPS_REFRESH cache-buster's coverage: the build-arg keyed on
%Y-W%V busts the install layer's cache on a new ISO week, but only
when the build actually runs. Untouched Python backends (torch,
transformers, vllm with no version pin) would otherwise ship stale
wheels indefinitely.
Add a Sunday 06:00 UTC cron that fires the full matrix. Schedule
events have no event.ref / event.before, so the script's changedFiles
== null fallback (scripts/changed-backends.js) emits the full matrix
automatically — no script change needed.
C++/Go backends with pinned deps cache-hit and complete fast, so the
weekly cost is dominated by Python re-resolves which is exactly what
we want.
workflow_dispatch added so a maintainer can trigger an ad-hoc
full-matrix rebuild without faking a tag push.
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <[email protected]>
---------
Signed-off-by: Ettore Di Giacinto <[email protected]>
Co-authored-by: Ettore Di Giacinto <[email protected]>
2026-05-08 21:43:41 +00:00
|
|
|
fs.appendFileSync(process.env.GITHUB_OUTPUT, `has-backends-darwin=true\n`);
|
ci: tag every backend digest, including singletons
backend_build.yml pushes by canonical digest only (push-by-digest=true,
no tags applied at build time). User-facing tagging happens in
backend_merge.yml's `imagetools create` step. Before this commit,
scripts/changed-backends.js emitted a merge entry only for tag-suffixes
with 2+ legs, so every single-arch backend (CUDA/ROCm/Intel Python
images, vLLM, sglang, transformers, diffusers, ...) pushed its digest
untagged and stayed that way until quay's GC reaped it. Symptom: tag
releases shipped multi-arch backends tagged correctly, but no
v<X>-gpu-nvidia-cuda-12-vllm (or any singleton variant) ever appeared
in the registry.
Changes:
- scripts/changed-backends.js drops the `group.length < 2` skip and
emits two merge matrices, one per arch class, so each downstream
merge job can `needs:` only its corresponding build matrix.
- backend.yml splits backend-merge-jobs into multiarch and singlearch
variants. The split preserves PR #9746's fix: slow singlearch CUDA
builds (~6h) must not gate multiarch merges, or quay's GC reaps the
multiarch per-arch digests before they're tagged.
- backend_pr.yml mirrors the split.
- backend_build.yml renames the digest artifact from
`digests<suffix>-<platform-tag>` to
`digests<suffix>--<platform-tag-or-"single">`. The `--` separator
prevents the merge-side glob from over-matching sibling backends
whose tag-suffix is a prefix of ours (e.g. -cpu-vllm vs
-cpu-vllm-omni, -cpu-mlx vs -cpu-mlx-audio); the `single` placeholder
keeps the name well-formed when platform-tag is empty.
- backend_merge.yml updates the download pattern to match.
Verified locally: a tag-push event now expands to 36 multiarch merge
entries (= 72 builds / 2 legs) and 199 singlearch merge entries (one
per singleton, including -gpu-nvidia-cuda-12-vllm at index 24).
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <[email protected]>
2026-05-11 13:22:00 +00:00
|
|
|
fs.appendFileSync(process.env.GITHUB_OUTPUT, `has-merges-multiarch=${hasMergesMultiarch}\n`);
|
|
|
|
|
fs.appendFileSync(process.env.GITHUB_OUTPUT, `has-merges-singlearch=${hasMergesSinglearch}\n`);
|
2026-05-10 16:15:53 +00:00
|
|
|
fs.appendFileSync(process.env.GITHUB_OUTPUT, `matrix-singlearch=${JSON.stringify({ include: singlearch })}\n`);
|
|
|
|
|
fs.appendFileSync(process.env.GITHUB_OUTPUT, `matrix-multiarch=${JSON.stringify({ include: multiarch })}\n`);
|
ci: phase 1-3 of GHA free tier migration (path filter, multi-arch split prep, /mnt disk relief) (#9726)
* ci: extract free-disk-space composite action
Consolidate the apt-clean + dotnet/android/ghc/boost removal blocks from
backend_build.yml, image_build.yml, and test.yml into a single composite
action. The three callers had slightly different inline blocks; the
composite uses the more aggressive backend_build/image_build variant for
all three callers — test.yml jobs now also purge snapd, edge/firefox/
powershell/r-base-core, and sweep /opt/ghc + /usr/local/share/boost +
$AGENT_TOOLSDIRECTORY. Idempotent and skipped on self-hosted runners.
In test.yml, actions/checkout now runs before the composite action call
because the composite lives at ./.github/actions/free-disk-space and
requires a checked-out repo. The original ordering relied on
jlumbroso/free-disk-space@main being a remote action; this is the
minimum-invasive change to support a local composite.
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <[email protected]>
* ci: path-filter backend.yml master push
Run scripts/changed-backends.js on master pushes too (not just PRs) so
unrelated commits don't rebuild all ~210 backend container images. Tag
pushes still build the full matrix via FORCE_ALL.
Push events use the GitHub Compare API to diff event.before..event.after.
Edge cases (first push with zero base, API truncation beyond 300 files,
missing fields, network failure) fall back to "run everything" — better
safe than silently miss a backend.
The matrix literal moves from .github/workflows/backend.yml into a new
data-only file at .github/backend-matrix.yml (outside workflows/ so
actionlint doesn't try to parse it as a workflow). Both backend.yml and
backend_pr.yml now consume the dynamic matrix output uniformly via
fromJson(needs.generate-matrix.outputs.matrix); the script reads the
matrix from the new location.
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <[email protected]>
* ci: bound max-parallel on backend-jobs matrices
Cap to 8 concurrent jobs to avoid queue starvation on the shared GHA free
pool while migration is in flight. Lift after Phases 4-5 retire the
self-hosted runners. Also drops a leftover commented-out max-parallel
line that lived in backend.yml since the previous matrix shape.
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <[email protected]>
* ci: scope backend cache per arch, push by digest
Prepare backend_build.yml for the multi-arch split. The reusable
workflow now accepts a `platform-tag` input ("amd64" / "arm64") that
scopes the registry cache to cache<suffix>-<platform-tag> and (on push
events) pushes the resulting image by canonical digest only. Digests
are uploaded as artifacts named digests<suffix>-<platform-tag> for the
merge job (Task 2.2) to consume.
`platform-tag` is optional with empty default during the migration —
existing callers continue to work unchanged (their cache key just
becomes `cache<suffix>-`, an orphaned but valid key). Tasks 2.3+ will
update callers to pass an explicit "amd64" / "arm64" value. Phase 6
flips the input to required: true once every caller is wired.
PR builds keep their existing tag-based push to ci-tests but pick up
the per-arch cache key. Multi-arch PR builds remain emulated in this
commit; they migrate when the matrix entries split (Tasks 2.3+).
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <[email protected]>
* ci: add backend_merge.yml reusable workflow
Joins per-arch digest artifacts (uploaded by backend_build.yml when
called with platform-tag) into a single tagged multi-arch manifest list
via `docker buildx imagetools create`. Called once per backend by
backend.yml after both per-arch build jobs succeed.
The workflow generates final tags identically to the previous monolithic
build job (same docker/metadata-action invocation), so consumers of
quay.io/go-skynet/local-ai-backends and localai/localai-backends see no
tag-shape change. Two imagetools calls (one per registry) reference the
same per-arch digests under different image names.
Not yet wired into backend.yml — Tasks 2.3+ rewrite individual matrix
entries to expand into per-arch + merge jobs that call this workflow.
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <[email protected]>
* ci: relocate Docker data-root to /mnt on hosted runners
GHA hosted ubuntu-latest runners ship a ~75 GB /mnt drive that's unused
by default. Stopping Docker, rsync'ing /var/lib/docker to /mnt, and
restarting with data-root pointing there yields ~100 GB of working
space (combined with the apt-clean from Task 1.1) — enough for ROCm
dev image + vLLM torch install + flash-attn intermediate layers.
This is the structural change that lets Phases 4 and 5 of the migration
plan move the bigger-runner and arc-runner-set jobs onto ubuntu-latest.
The composite action is no-op on self-hosted runners (where /mnt isn't
expected) and on non-X64 runners (Task 3.2 verifies the arm64 hosted
pool's /mnt shape separately before enabling). Wired into both
backend_build.yml and image_build.yml between free-disk-space and the
first Docker operation.
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <[email protected]>
* ci(setup-build-disk): chmod 1777 /mnt/docker-tmp
buildx CLI runs as the unprivileged 'runner' user and creates config
dirs under TMPDIR before binding them into the buildkit container.
/mnt is root-owned by default, so the original mkdir produced a
permission-denied when buildx tried to write there:
ERROR: mkdir /mnt/docker-tmp/buildkitd-config2740457204: permission denied
Mirror /tmp's permission mode (1777 — world-writable with sticky bit)
on /mnt/docker-tmp so non-root processes can stage their config.
Caught by the first PR run (image-build hipblas job) on PR #9726.
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <[email protected]>
* ci: weekly full-matrix rebuild via cron
Path-filtering backend.yml master push (the previous commit's main
optimization) skips backends whose source didn't change. That broke
the DEPS_REFRESH cache-buster's coverage: the build-arg keyed on
%Y-W%V busts the install layer's cache on a new ISO week, but only
when the build actually runs. Untouched Python backends (torch,
transformers, vllm with no version pin) would otherwise ship stale
wheels indefinitely.
Add a Sunday 06:00 UTC cron that fires the full matrix. Schedule
events have no event.ref / event.before, so the script's changedFiles
== null fallback (scripts/changed-backends.js) emits the full matrix
automatically — no script change needed.
C++/Go backends with pinned deps cache-hit and complete fast, so the
weekly cost is dominated by Python re-resolves which is exactly what
we want.
workflow_dispatch added so a maintainer can trigger an ad-hoc
full-matrix rebuild without faking a tag push.
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <[email protected]>
---------
Signed-off-by: Ettore Di Giacinto <[email protected]>
Co-authored-by: Ettore Di Giacinto <[email protected]>
2026-05-08 21:43:41 +00:00
|
|
|
fs.appendFileSync(process.env.GITHUB_OUTPUT, `matrix-darwin=${JSON.stringify({ include: includesDarwin })}\n`);
|
ci: tag every backend digest, including singletons
backend_build.yml pushes by canonical digest only (push-by-digest=true,
no tags applied at build time). User-facing tagging happens in
backend_merge.yml's `imagetools create` step. Before this commit,
scripts/changed-backends.js emitted a merge entry only for tag-suffixes
with 2+ legs, so every single-arch backend (CUDA/ROCm/Intel Python
images, vLLM, sglang, transformers, diffusers, ...) pushed its digest
untagged and stayed that way until quay's GC reaped it. Symptom: tag
releases shipped multi-arch backends tagged correctly, but no
v<X>-gpu-nvidia-cuda-12-vllm (or any singleton variant) ever appeared
in the registry.
Changes:
- scripts/changed-backends.js drops the `group.length < 2` skip and
emits two merge matrices, one per arch class, so each downstream
merge job can `needs:` only its corresponding build matrix.
- backend.yml splits backend-merge-jobs into multiarch and singlearch
variants. The split preserves PR #9746's fix: slow singlearch CUDA
builds (~6h) must not gate multiarch merges, or quay's GC reaps the
multiarch per-arch digests before they're tagged.
- backend_pr.yml mirrors the split.
- backend_build.yml renames the digest artifact from
`digests<suffix>-<platform-tag>` to
`digests<suffix>--<platform-tag-or-"single">`. The `--` separator
prevents the merge-side glob from over-matching sibling backends
whose tag-suffix is a prefix of ours (e.g. -cpu-vllm vs
-cpu-vllm-omni, -cpu-mlx vs -cpu-mlx-audio); the `single` placeholder
keeps the name well-formed when platform-tag is empty.
- backend_merge.yml updates the download pattern to match.
Verified locally: a tag-push event now expands to 36 multiarch merge
entries (= 72 builds / 2 legs) and 199 singlearch merge entries (one
per singleton, including -gpu-nvidia-cuda-12-vllm at index 24).
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <[email protected]>
2026-05-11 13:22:00 +00:00
|
|
|
fs.appendFileSync(process.env.GITHUB_OUTPUT, `merge-matrix-multiarch=${JSON.stringify(mergeMatrixMultiarch)}\n`);
|
|
|
|
|
fs.appendFileSync(process.env.GITHUB_OUTPUT, `merge-matrix-singlearch=${JSON.stringify(mergeMatrixSinglearch)}\n`);
|
ci: phase 1-3 of GHA free tier migration (path filter, multi-arch split prep, /mnt disk relief) (#9726)
* ci: extract free-disk-space composite action
Consolidate the apt-clean + dotnet/android/ghc/boost removal blocks from
backend_build.yml, image_build.yml, and test.yml into a single composite
action. The three callers had slightly different inline blocks; the
composite uses the more aggressive backend_build/image_build variant for
all three callers — test.yml jobs now also purge snapd, edge/firefox/
powershell/r-base-core, and sweep /opt/ghc + /usr/local/share/boost +
$AGENT_TOOLSDIRECTORY. Idempotent and skipped on self-hosted runners.
In test.yml, actions/checkout now runs before the composite action call
because the composite lives at ./.github/actions/free-disk-space and
requires a checked-out repo. The original ordering relied on
jlumbroso/free-disk-space@main being a remote action; this is the
minimum-invasive change to support a local composite.
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <[email protected]>
* ci: path-filter backend.yml master push
Run scripts/changed-backends.js on master pushes too (not just PRs) so
unrelated commits don't rebuild all ~210 backend container images. Tag
pushes still build the full matrix via FORCE_ALL.
Push events use the GitHub Compare API to diff event.before..event.after.
Edge cases (first push with zero base, API truncation beyond 300 files,
missing fields, network failure) fall back to "run everything" — better
safe than silently miss a backend.
The matrix literal moves from .github/workflows/backend.yml into a new
data-only file at .github/backend-matrix.yml (outside workflows/ so
actionlint doesn't try to parse it as a workflow). Both backend.yml and
backend_pr.yml now consume the dynamic matrix output uniformly via
fromJson(needs.generate-matrix.outputs.matrix); the script reads the
matrix from the new location.
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <[email protected]>
* ci: bound max-parallel on backend-jobs matrices
Cap to 8 concurrent jobs to avoid queue starvation on the shared GHA free
pool while migration is in flight. Lift after Phases 4-5 retire the
self-hosted runners. Also drops a leftover commented-out max-parallel
line that lived in backend.yml since the previous matrix shape.
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <[email protected]>
* ci: scope backend cache per arch, push by digest
Prepare backend_build.yml for the multi-arch split. The reusable
workflow now accepts a `platform-tag` input ("amd64" / "arm64") that
scopes the registry cache to cache<suffix>-<platform-tag> and (on push
events) pushes the resulting image by canonical digest only. Digests
are uploaded as artifacts named digests<suffix>-<platform-tag> for the
merge job (Task 2.2) to consume.
`platform-tag` is optional with empty default during the migration —
existing callers continue to work unchanged (their cache key just
becomes `cache<suffix>-`, an orphaned but valid key). Tasks 2.3+ will
update callers to pass an explicit "amd64" / "arm64" value. Phase 6
flips the input to required: true once every caller is wired.
PR builds keep their existing tag-based push to ci-tests but pick up
the per-arch cache key. Multi-arch PR builds remain emulated in this
commit; they migrate when the matrix entries split (Tasks 2.3+).
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <[email protected]>
* ci: add backend_merge.yml reusable workflow
Joins per-arch digest artifacts (uploaded by backend_build.yml when
called with platform-tag) into a single tagged multi-arch manifest list
via `docker buildx imagetools create`. Called once per backend by
backend.yml after both per-arch build jobs succeed.
The workflow generates final tags identically to the previous monolithic
build job (same docker/metadata-action invocation), so consumers of
quay.io/go-skynet/local-ai-backends and localai/localai-backends see no
tag-shape change. Two imagetools calls (one per registry) reference the
same per-arch digests under different image names.
Not yet wired into backend.yml — Tasks 2.3+ rewrite individual matrix
entries to expand into per-arch + merge jobs that call this workflow.
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <[email protected]>
* ci: relocate Docker data-root to /mnt on hosted runners
GHA hosted ubuntu-latest runners ship a ~75 GB /mnt drive that's unused
by default. Stopping Docker, rsync'ing /var/lib/docker to /mnt, and
restarting with data-root pointing there yields ~100 GB of working
space (combined with the apt-clean from Task 1.1) — enough for ROCm
dev image + vLLM torch install + flash-attn intermediate layers.
This is the structural change that lets Phases 4 and 5 of the migration
plan move the bigger-runner and arc-runner-set jobs onto ubuntu-latest.
The composite action is no-op on self-hosted runners (where /mnt isn't
expected) and on non-X64 runners (Task 3.2 verifies the arm64 hosted
pool's /mnt shape separately before enabling). Wired into both
backend_build.yml and image_build.yml between free-disk-space and the
first Docker operation.
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <[email protected]>
* ci(setup-build-disk): chmod 1777 /mnt/docker-tmp
buildx CLI runs as the unprivileged 'runner' user and creates config
dirs under TMPDIR before binding them into the buildkit container.
/mnt is root-owned by default, so the original mkdir produced a
permission-denied when buildx tried to write there:
ERROR: mkdir /mnt/docker-tmp/buildkitd-config2740457204: permission denied
Mirror /tmp's permission mode (1777 — world-writable with sticky bit)
on /mnt/docker-tmp so non-root processes can stage their config.
Caught by the first PR run (image-build hipblas job) on PR #9726.
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <[email protected]>
* ci: weekly full-matrix rebuild via cron
Path-filtering backend.yml master push (the previous commit's main
optimization) skips backends whose source didn't change. That broke
the DEPS_REFRESH cache-buster's coverage: the build-arg keyed on
%Y-W%V busts the install layer's cache on a new ISO week, but only
when the build actually runs. Untouched Python backends (torch,
transformers, vllm with no version pin) would otherwise ship stale
wheels indefinitely.
Add a Sunday 06:00 UTC cron that fires the full matrix. Schedule
events have no event.ref / event.before, so the script's changedFiles
== null fallback (scripts/changed-backends.js) emits the full matrix
automatically — no script change needed.
C++/Go backends with pinned deps cache-hit and complete fast, so the
weekly cost is dominated by Python re-resolves which is exactly what
we want.
workflow_dispatch added so a maintainer can trigger an ad-hoc
full-matrix rebuild without faking a tag push.
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <[email protected]>
---------
Signed-off-by: Ettore Di Giacinto <[email protected]>
Co-authored-by: Ettore Di Giacinto <[email protected]>
2026-05-08 21:43:41 +00:00
|
|
|
for (const backend of allBackendPaths.keys()) {
|
|
|
|
|
fs.appendFileSync(process.env.GITHUB_OUTPUT, `${backend}=true\n`);
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
function emitFilteredMatrix(changedFiles) {
|
2025-08-18 15:56:34 +00:00
|
|
|
console.log("Changed files:", changedFiles);
|
|
|
|
|
|
|
|
|
|
const filtered = includes.filter(item => {
|
|
|
|
|
const backendPath = inferBackendPath(item);
|
|
|
|
|
if (!backendPath) return false;
|
|
|
|
|
return changedFiles.some(file => file.startsWith(backendPath));
|
|
|
|
|
});
|
|
|
|
|
|
2025-09-01 20:18:30 +00:00
|
|
|
const filteredDarwin = includesDarwin.filter(item => {
|
|
|
|
|
const backendPath = inferBackendPathDarwin(item);
|
|
|
|
|
return changedFiles.some(file => file.startsWith(backendPath));
|
ci: phase 1-3 of GHA free tier migration (path filter, multi-arch split prep, /mnt disk relief) (#9726)
* ci: extract free-disk-space composite action
Consolidate the apt-clean + dotnet/android/ghc/boost removal blocks from
backend_build.yml, image_build.yml, and test.yml into a single composite
action. The three callers had slightly different inline blocks; the
composite uses the more aggressive backend_build/image_build variant for
all three callers — test.yml jobs now also purge snapd, edge/firefox/
powershell/r-base-core, and sweep /opt/ghc + /usr/local/share/boost +
$AGENT_TOOLSDIRECTORY. Idempotent and skipped on self-hosted runners.
In test.yml, actions/checkout now runs before the composite action call
because the composite lives at ./.github/actions/free-disk-space and
requires a checked-out repo. The original ordering relied on
jlumbroso/free-disk-space@main being a remote action; this is the
minimum-invasive change to support a local composite.
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <[email protected]>
* ci: path-filter backend.yml master push
Run scripts/changed-backends.js on master pushes too (not just PRs) so
unrelated commits don't rebuild all ~210 backend container images. Tag
pushes still build the full matrix via FORCE_ALL.
Push events use the GitHub Compare API to diff event.before..event.after.
Edge cases (first push with zero base, API truncation beyond 300 files,
missing fields, network failure) fall back to "run everything" — better
safe than silently miss a backend.
The matrix literal moves from .github/workflows/backend.yml into a new
data-only file at .github/backend-matrix.yml (outside workflows/ so
actionlint doesn't try to parse it as a workflow). Both backend.yml and
backend_pr.yml now consume the dynamic matrix output uniformly via
fromJson(needs.generate-matrix.outputs.matrix); the script reads the
matrix from the new location.
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <[email protected]>
* ci: bound max-parallel on backend-jobs matrices
Cap to 8 concurrent jobs to avoid queue starvation on the shared GHA free
pool while migration is in flight. Lift after Phases 4-5 retire the
self-hosted runners. Also drops a leftover commented-out max-parallel
line that lived in backend.yml since the previous matrix shape.
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <[email protected]>
* ci: scope backend cache per arch, push by digest
Prepare backend_build.yml for the multi-arch split. The reusable
workflow now accepts a `platform-tag` input ("amd64" / "arm64") that
scopes the registry cache to cache<suffix>-<platform-tag> and (on push
events) pushes the resulting image by canonical digest only. Digests
are uploaded as artifacts named digests<suffix>-<platform-tag> for the
merge job (Task 2.2) to consume.
`platform-tag` is optional with empty default during the migration —
existing callers continue to work unchanged (their cache key just
becomes `cache<suffix>-`, an orphaned but valid key). Tasks 2.3+ will
update callers to pass an explicit "amd64" / "arm64" value. Phase 6
flips the input to required: true once every caller is wired.
PR builds keep their existing tag-based push to ci-tests but pick up
the per-arch cache key. Multi-arch PR builds remain emulated in this
commit; they migrate when the matrix entries split (Tasks 2.3+).
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <[email protected]>
* ci: add backend_merge.yml reusable workflow
Joins per-arch digest artifacts (uploaded by backend_build.yml when
called with platform-tag) into a single tagged multi-arch manifest list
via `docker buildx imagetools create`. Called once per backend by
backend.yml after both per-arch build jobs succeed.
The workflow generates final tags identically to the previous monolithic
build job (same docker/metadata-action invocation), so consumers of
quay.io/go-skynet/local-ai-backends and localai/localai-backends see no
tag-shape change. Two imagetools calls (one per registry) reference the
same per-arch digests under different image names.
Not yet wired into backend.yml — Tasks 2.3+ rewrite individual matrix
entries to expand into per-arch + merge jobs that call this workflow.
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <[email protected]>
* ci: relocate Docker data-root to /mnt on hosted runners
GHA hosted ubuntu-latest runners ship a ~75 GB /mnt drive that's unused
by default. Stopping Docker, rsync'ing /var/lib/docker to /mnt, and
restarting with data-root pointing there yields ~100 GB of working
space (combined with the apt-clean from Task 1.1) — enough for ROCm
dev image + vLLM torch install + flash-attn intermediate layers.
This is the structural change that lets Phases 4 and 5 of the migration
plan move the bigger-runner and arc-runner-set jobs onto ubuntu-latest.
The composite action is no-op on self-hosted runners (where /mnt isn't
expected) and on non-X64 runners (Task 3.2 verifies the arm64 hosted
pool's /mnt shape separately before enabling). Wired into both
backend_build.yml and image_build.yml between free-disk-space and the
first Docker operation.
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <[email protected]>
* ci(setup-build-disk): chmod 1777 /mnt/docker-tmp
buildx CLI runs as the unprivileged 'runner' user and creates config
dirs under TMPDIR before binding them into the buildkit container.
/mnt is root-owned by default, so the original mkdir produced a
permission-denied when buildx tried to write there:
ERROR: mkdir /mnt/docker-tmp/buildkitd-config2740457204: permission denied
Mirror /tmp's permission mode (1777 — world-writable with sticky bit)
on /mnt/docker-tmp so non-root processes can stage their config.
Caught by the first PR run (image-build hipblas job) on PR #9726.
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <[email protected]>
* ci: weekly full-matrix rebuild via cron
Path-filtering backend.yml master push (the previous commit's main
optimization) skips backends whose source didn't change. That broke
the DEPS_REFRESH cache-buster's coverage: the build-arg keyed on
%Y-W%V busts the install layer's cache on a new ISO week, but only
when the build actually runs. Untouched Python backends (torch,
transformers, vllm with no version pin) would otherwise ship stale
wheels indefinitely.
Add a Sunday 06:00 UTC cron that fires the full matrix. Schedule
events have no event.ref / event.before, so the script's changedFiles
== null fallback (scripts/changed-backends.js) emits the full matrix
automatically — no script change needed.
C++/Go backends with pinned deps cache-hit and complete fast, so the
weekly cost is dominated by Python re-resolves which is exactly what
we want.
workflow_dispatch added so a maintainer can trigger an ad-hoc
full-matrix rebuild without faking a tag push.
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <[email protected]>
---------
Signed-off-by: Ettore Di Giacinto <[email protected]>
Co-authored-by: Ettore Di Giacinto <[email protected]>
2026-05-08 21:43:41 +00:00
|
|
|
});
|
2025-09-01 20:18:30 +00:00
|
|
|
|
2025-08-18 15:56:34 +00:00
|
|
|
console.log("Filtered files:", filtered);
|
2025-09-01 20:18:30 +00:00
|
|
|
console.log("Filtered files Darwin:", filteredDarwin);
|
2025-08-18 15:56:34 +00:00
|
|
|
|
2026-05-10 16:15:53 +00:00
|
|
|
const { multiarch, singlearch } = splitByArch(filtered);
|
|
|
|
|
const hasBackendsSinglearch = singlearch.length > 0 ? 'true' : 'false';
|
|
|
|
|
const hasBackendsMultiarch = multiarch.length > 0 ? 'true' : 'false';
|
2025-09-01 20:18:30 +00:00
|
|
|
const hasBackendsDarwin = filteredDarwin.length > 0 ? 'true' : 'false';
|
2026-05-10 16:15:53 +00:00
|
|
|
console.log("Has single-arch backends?:", hasBackendsSinglearch);
|
|
|
|
|
console.log("Has multi-arch backends?:", hasBackendsMultiarch);
|
2025-09-01 20:18:30 +00:00
|
|
|
console.log("Has Darwin backends?:", hasBackendsDarwin);
|
2025-08-18 15:56:34 +00:00
|
|
|
|
ci: tag every backend digest, including singletons
backend_build.yml pushes by canonical digest only (push-by-digest=true,
no tags applied at build time). User-facing tagging happens in
backend_merge.yml's `imagetools create` step. Before this commit,
scripts/changed-backends.js emitted a merge entry only for tag-suffixes
with 2+ legs, so every single-arch backend (CUDA/ROCm/Intel Python
images, vLLM, sglang, transformers, diffusers, ...) pushed its digest
untagged and stayed that way until quay's GC reaped it. Symptom: tag
releases shipped multi-arch backends tagged correctly, but no
v<X>-gpu-nvidia-cuda-12-vllm (or any singleton variant) ever appeared
in the registry.
Changes:
- scripts/changed-backends.js drops the `group.length < 2` skip and
emits two merge matrices, one per arch class, so each downstream
merge job can `needs:` only its corresponding build matrix.
- backend.yml splits backend-merge-jobs into multiarch and singlearch
variants. The split preserves PR #9746's fix: slow singlearch CUDA
builds (~6h) must not gate multiarch merges, or quay's GC reaps the
multiarch per-arch digests before they're tagged.
- backend_pr.yml mirrors the split.
- backend_build.yml renames the digest artifact from
`digests<suffix>-<platform-tag>` to
`digests<suffix>--<platform-tag-or-"single">`. The `--` separator
prevents the merge-side glob from over-matching sibling backends
whose tag-suffix is a prefix of ours (e.g. -cpu-vllm vs
-cpu-vllm-omni, -cpu-mlx vs -cpu-mlx-audio); the `single` placeholder
keeps the name well-formed when platform-tag is empty.
- backend_merge.yml updates the download pattern to match.
Verified locally: a tag-push event now expands to 36 multiarch merge
entries (= 72 builds / 2 legs) and 199 singlearch merge entries (one
per singleton, including -gpu-nvidia-cuda-12-vllm at index 24).
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <[email protected]>
2026-05-11 13:22:00 +00:00
|
|
|
const mergeMatrixMultiarch = computeMergeMatrix(multiarch);
|
|
|
|
|
const mergeMatrixSinglearch = computeMergeMatrix(singlearch);
|
|
|
|
|
const hasMergesMultiarch = mergeMatrixMultiarch.include.length > 0 ? 'true' : 'false';
|
|
|
|
|
const hasMergesSinglearch = mergeMatrixSinglearch.include.length > 0 ? 'true' : 'false';
|
2026-05-08 22:04:42 +00:00
|
|
|
|
2026-03-30 17:46:07 +00:00
|
|
|
fs.appendFileSync(process.env.GITHUB_OUTPUT, `run-all=false\n`);
|
2026-05-10 16:15:53 +00:00
|
|
|
fs.appendFileSync(process.env.GITHUB_OUTPUT, `has-backends-singlearch=${hasBackendsSinglearch}\n`);
|
|
|
|
|
fs.appendFileSync(process.env.GITHUB_OUTPUT, `has-backends-multiarch=${hasBackendsMultiarch}\n`);
|
2025-09-01 20:18:30 +00:00
|
|
|
fs.appendFileSync(process.env.GITHUB_OUTPUT, `has-backends-darwin=${hasBackendsDarwin}\n`);
|
ci: tag every backend digest, including singletons
backend_build.yml pushes by canonical digest only (push-by-digest=true,
no tags applied at build time). User-facing tagging happens in
backend_merge.yml's `imagetools create` step. Before this commit,
scripts/changed-backends.js emitted a merge entry only for tag-suffixes
with 2+ legs, so every single-arch backend (CUDA/ROCm/Intel Python
images, vLLM, sglang, transformers, diffusers, ...) pushed its digest
untagged and stayed that way until quay's GC reaped it. Symptom: tag
releases shipped multi-arch backends tagged correctly, but no
v<X>-gpu-nvidia-cuda-12-vllm (or any singleton variant) ever appeared
in the registry.
Changes:
- scripts/changed-backends.js drops the `group.length < 2` skip and
emits two merge matrices, one per arch class, so each downstream
merge job can `needs:` only its corresponding build matrix.
- backend.yml splits backend-merge-jobs into multiarch and singlearch
variants. The split preserves PR #9746's fix: slow singlearch CUDA
builds (~6h) must not gate multiarch merges, or quay's GC reaps the
multiarch per-arch digests before they're tagged.
- backend_pr.yml mirrors the split.
- backend_build.yml renames the digest artifact from
`digests<suffix>-<platform-tag>` to
`digests<suffix>--<platform-tag-or-"single">`. The `--` separator
prevents the merge-side glob from over-matching sibling backends
whose tag-suffix is a prefix of ours (e.g. -cpu-vllm vs
-cpu-vllm-omni, -cpu-mlx vs -cpu-mlx-audio); the `single` placeholder
keeps the name well-formed when platform-tag is empty.
- backend_merge.yml updates the download pattern to match.
Verified locally: a tag-push event now expands to 36 multiarch merge
entries (= 72 builds / 2 legs) and 199 singlearch merge entries (one
per singleton, including -gpu-nvidia-cuda-12-vllm at index 24).
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <[email protected]>
2026-05-11 13:22:00 +00:00
|
|
|
fs.appendFileSync(process.env.GITHUB_OUTPUT, `has-merges-multiarch=${hasMergesMultiarch}\n`);
|
|
|
|
|
fs.appendFileSync(process.env.GITHUB_OUTPUT, `has-merges-singlearch=${hasMergesSinglearch}\n`);
|
2026-05-10 16:15:53 +00:00
|
|
|
fs.appendFileSync(process.env.GITHUB_OUTPUT, `matrix-singlearch=${JSON.stringify({ include: singlearch })}\n`);
|
|
|
|
|
fs.appendFileSync(process.env.GITHUB_OUTPUT, `matrix-multiarch=${JSON.stringify({ include: multiarch })}\n`);
|
2025-09-01 20:18:30 +00:00
|
|
|
fs.appendFileSync(process.env.GITHUB_OUTPUT, `matrix-darwin=${JSON.stringify({ include: filteredDarwin })}\n`);
|
ci: tag every backend digest, including singletons
backend_build.yml pushes by canonical digest only (push-by-digest=true,
no tags applied at build time). User-facing tagging happens in
backend_merge.yml's `imagetools create` step. Before this commit,
scripts/changed-backends.js emitted a merge entry only for tag-suffixes
with 2+ legs, so every single-arch backend (CUDA/ROCm/Intel Python
images, vLLM, sglang, transformers, diffusers, ...) pushed its digest
untagged and stayed that way until quay's GC reaped it. Symptom: tag
releases shipped multi-arch backends tagged correctly, but no
v<X>-gpu-nvidia-cuda-12-vllm (or any singleton variant) ever appeared
in the registry.
Changes:
- scripts/changed-backends.js drops the `group.length < 2` skip and
emits two merge matrices, one per arch class, so each downstream
merge job can `needs:` only its corresponding build matrix.
- backend.yml splits backend-merge-jobs into multiarch and singlearch
variants. The split preserves PR #9746's fix: slow singlearch CUDA
builds (~6h) must not gate multiarch merges, or quay's GC reaps the
multiarch per-arch digests before they're tagged.
- backend_pr.yml mirrors the split.
- backend_build.yml renames the digest artifact from
`digests<suffix>-<platform-tag>` to
`digests<suffix>--<platform-tag-or-"single">`. The `--` separator
prevents the merge-side glob from over-matching sibling backends
whose tag-suffix is a prefix of ours (e.g. -cpu-vllm vs
-cpu-vllm-omni, -cpu-mlx vs -cpu-mlx-audio); the `single` placeholder
keeps the name well-formed when platform-tag is empty.
- backend_merge.yml updates the download pattern to match.
Verified locally: a tag-push event now expands to 36 multiarch merge
entries (= 72 builds / 2 legs) and 199 singlearch merge entries (one
per singleton, including -gpu-nvidia-cuda-12-vllm at index 24).
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <[email protected]>
2026-05-11 13:22:00 +00:00
|
|
|
fs.appendFileSync(process.env.GITHUB_OUTPUT, `merge-matrix-multiarch=${JSON.stringify(mergeMatrixMultiarch)}\n`);
|
|
|
|
|
fs.appendFileSync(process.env.GITHUB_OUTPUT, `merge-matrix-singlearch=${JSON.stringify(mergeMatrixSinglearch)}\n`);
|
2026-03-30 17:46:07 +00:00
|
|
|
|
|
|
|
|
// Per-backend boolean outputs
|
|
|
|
|
for (const [backend, pathPrefix] of allBackendPaths) {
|
feat(backend): add turboquant llama.cpp-fork backend (#9355)
* feat(backend): add turboquant llama.cpp-fork backend
turboquant is a llama.cpp fork (TheTom/llama-cpp-turboquant, branch
feature/turboquant-kv-cache) that adds a TurboQuant KV-cache scheme.
It ships as a first-class backend reusing backend/cpp/llama-cpp sources
via a thin wrapper Makefile: each variant target copies ../llama-cpp
into a sibling build dir and invokes llama-cpp's build-llama-cpp-grpc-server
with LLAMA_REPO/LLAMA_VERSION overridden to point at the fork. No
duplication of grpc-server.cpp — upstream fixes flow through automatically.
Wires up the full matrix (CPU, CUDA 12/13, L4T, L4T-CUDA13, ROCm, SYCL
f32/f16, Vulkan) in backend.yml and the gallery entries in index.yaml,
adds a tests-turboquant-grpc e2e job driven by BACKEND_TEST_CACHE_TYPE_K/V=q8_0
to exercise the KV-cache config path (backend_test.go gains dedicated env
vars wired into ModelOptions.CacheTypeKey/Value — a generic improvement
usable by any llama.cpp-family backend), and registers a nightly auto-bump
PR in bump_deps.yaml tracking feature/turboquant-kv-cache.
scripts/changed-backends.js gets a special-case so edits to
backend/cpp/llama-cpp/ also retrigger the turboquant CI pipeline, since
the wrapper reuses those sources.
* feat(turboquant): carry upstream patches against fork API drift
turboquant branched from llama.cpp before upstream commit 66060008
("server: respect the ignore eos flag", #21203) which added the
`logit_bias_eog` field to `server_context_meta` and a matching
parameter to `server_task::params_from_json_cmpl`. The shared
backend/cpp/llama-cpp/grpc-server.cpp depends on that field, so
building it against the fork unmodified fails.
Cherry-pick that commit as a patch file under
backend/cpp/turboquant/patches/ and apply it to the cloned fork
sources via a new apply-patches.sh hook called from the wrapper
Makefile. Simplifies the build flow too: instead of hopping through
llama-cpp's build-llama-cpp-grpc-server indirection, the wrapper now
drives the copied Makefile directly (clone -> patch -> build).
Drop the corresponding patch whenever the fork catches up with
upstream — the build fails fast if a patch stops applying, which
is the signal to retire it.
* docs: add turboquant backend section + clarify cache_type_k/v
Document the new turboquant (llama.cpp fork with TurboQuant KV-cache)
backend alongside the existing llama-cpp / ik-llama-cpp sections in
features/text-generation.md: when to pick it, how to install it from
the gallery, and a YAML example showing backend: turboquant together
with cache_type_k / cache_type_v.
Also expand the cache_type_k / cache_type_v table rows in
advanced/model-configuration.md to spell out the accepted llama.cpp
quantization values and note that these fields apply to all
llama.cpp-family backends, not just vLLM.
* feat(turboquant): patch ggml-rpc GGML_OP_COUNT assertion
The fork adds new GGML ops bringing GGML_OP_COUNT to 97, but
ggml/include/ggml-rpc.h static-asserts it equals 96, breaking
the GGML_RPC=ON build paths (turboquant-grpc / turboquant-rpc-server).
Carry a one-line patch that updates the expected count so the
assertion holds. Drop this patch whenever the fork fixes it upstream.
* feat(turboquant): allow turbo* KV-cache types and exercise them in e2e
The shared backend/cpp/llama-cpp/grpc-server.cpp carries its own
allow-list of accepted KV-cache types (kv_cache_types[]) and rejects
anything outside it before the value reaches llama.cpp's parser. That
list only contains the standard llama.cpp types — turbo2/turbo3/turbo4
would throw "Unsupported cache type" at LoadModel time, meaning
nothing the LocalAI gRPC layer accepted was actually fork-specific.
Add a build-time augmentation step (patch-grpc-server.sh, called from
the turboquant wrapper Makefile) that inserts GGML_TYPE_TURBO2_0/3_0/4_0
into the allow-list of the *copied* grpc-server.cpp under
turboquant-<flavor>-build/. The original file under backend/cpp/llama-cpp/
is never touched, so the stock llama-cpp build keeps compiling against
vanilla upstream which has no notion of those enum values.
Switch test-extra-backend-turboquant to set
BACKEND_TEST_CACHE_TYPE_K=turbo3 / _V=turbo3 so the e2e gRPC suite
actually runs the fork's TurboQuant KV-cache code paths (turbo3 also
auto-enables flash_attention in the fork). Picking q8_0 here would
only re-test the standard llama.cpp path that the upstream llama-cpp
backend already covers.
Refresh the docs (text-generation.md + model-configuration.md) to
list turbo2/turbo3/turbo4 explicitly and call out that you only get
the TurboQuant code path with this backend + a turbo* cache type.
* fix(turboquant): rewrite patch-grpc-server.sh in awk, not python3
The builder image (ubuntu:24.04 stage-2 in Dockerfile.turboquant)
does not install python3, so the python-based augmentation step
errored with `python3: command not found` at make time. Switch to
awk, which ships in coreutils and is already available everywhere
the rest of the wrapper Makefile runs.
* Apply suggestion from @mudler
Signed-off-by: Ettore Di Giacinto <[email protected]>
---------
Signed-off-by: Ettore Di Giacinto <[email protected]>
2026-04-14 23:25:04 +00:00
|
|
|
let changed = changedFiles.some(file => file.startsWith(pathPrefix));
|
|
|
|
|
// turboquant reuses backend/cpp/llama-cpp sources via a thin wrapper;
|
|
|
|
|
// changes to either directory should retrigger its pipeline.
|
|
|
|
|
if (backend === "turboquant" && !changed) {
|
|
|
|
|
changed = changedFiles.some(file => file.startsWith("backend/cpp/llama-cpp/"));
|
|
|
|
|
}
|
2026-03-30 17:46:07 +00:00
|
|
|
fs.appendFileSync(process.env.GITHUB_OUTPUT, `${backend}=${changed ? 'true' : 'false'}\n`);
|
|
|
|
|
}
|
ci: phase 1-3 of GHA free tier migration (path filter, multi-arch split prep, /mnt disk relief) (#9726)
* ci: extract free-disk-space composite action
Consolidate the apt-clean + dotnet/android/ghc/boost removal blocks from
backend_build.yml, image_build.yml, and test.yml into a single composite
action. The three callers had slightly different inline blocks; the
composite uses the more aggressive backend_build/image_build variant for
all three callers — test.yml jobs now also purge snapd, edge/firefox/
powershell/r-base-core, and sweep /opt/ghc + /usr/local/share/boost +
$AGENT_TOOLSDIRECTORY. Idempotent and skipped on self-hosted runners.
In test.yml, actions/checkout now runs before the composite action call
because the composite lives at ./.github/actions/free-disk-space and
requires a checked-out repo. The original ordering relied on
jlumbroso/free-disk-space@main being a remote action; this is the
minimum-invasive change to support a local composite.
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <[email protected]>
* ci: path-filter backend.yml master push
Run scripts/changed-backends.js on master pushes too (not just PRs) so
unrelated commits don't rebuild all ~210 backend container images. Tag
pushes still build the full matrix via FORCE_ALL.
Push events use the GitHub Compare API to diff event.before..event.after.
Edge cases (first push with zero base, API truncation beyond 300 files,
missing fields, network failure) fall back to "run everything" — better
safe than silently miss a backend.
The matrix literal moves from .github/workflows/backend.yml into a new
data-only file at .github/backend-matrix.yml (outside workflows/ so
actionlint doesn't try to parse it as a workflow). Both backend.yml and
backend_pr.yml now consume the dynamic matrix output uniformly via
fromJson(needs.generate-matrix.outputs.matrix); the script reads the
matrix from the new location.
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <[email protected]>
* ci: bound max-parallel on backend-jobs matrices
Cap to 8 concurrent jobs to avoid queue starvation on the shared GHA free
pool while migration is in flight. Lift after Phases 4-5 retire the
self-hosted runners. Also drops a leftover commented-out max-parallel
line that lived in backend.yml since the previous matrix shape.
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <[email protected]>
* ci: scope backend cache per arch, push by digest
Prepare backend_build.yml for the multi-arch split. The reusable
workflow now accepts a `platform-tag` input ("amd64" / "arm64") that
scopes the registry cache to cache<suffix>-<platform-tag> and (on push
events) pushes the resulting image by canonical digest only. Digests
are uploaded as artifacts named digests<suffix>-<platform-tag> for the
merge job (Task 2.2) to consume.
`platform-tag` is optional with empty default during the migration —
existing callers continue to work unchanged (their cache key just
becomes `cache<suffix>-`, an orphaned but valid key). Tasks 2.3+ will
update callers to pass an explicit "amd64" / "arm64" value. Phase 6
flips the input to required: true once every caller is wired.
PR builds keep their existing tag-based push to ci-tests but pick up
the per-arch cache key. Multi-arch PR builds remain emulated in this
commit; they migrate when the matrix entries split (Tasks 2.3+).
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <[email protected]>
* ci: add backend_merge.yml reusable workflow
Joins per-arch digest artifacts (uploaded by backend_build.yml when
called with platform-tag) into a single tagged multi-arch manifest list
via `docker buildx imagetools create`. Called once per backend by
backend.yml after both per-arch build jobs succeed.
The workflow generates final tags identically to the previous monolithic
build job (same docker/metadata-action invocation), so consumers of
quay.io/go-skynet/local-ai-backends and localai/localai-backends see no
tag-shape change. Two imagetools calls (one per registry) reference the
same per-arch digests under different image names.
Not yet wired into backend.yml — Tasks 2.3+ rewrite individual matrix
entries to expand into per-arch + merge jobs that call this workflow.
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <[email protected]>
* ci: relocate Docker data-root to /mnt on hosted runners
GHA hosted ubuntu-latest runners ship a ~75 GB /mnt drive that's unused
by default. Stopping Docker, rsync'ing /var/lib/docker to /mnt, and
restarting with data-root pointing there yields ~100 GB of working
space (combined with the apt-clean from Task 1.1) — enough for ROCm
dev image + vLLM torch install + flash-attn intermediate layers.
This is the structural change that lets Phases 4 and 5 of the migration
plan move the bigger-runner and arc-runner-set jobs onto ubuntu-latest.
The composite action is no-op on self-hosted runners (where /mnt isn't
expected) and on non-X64 runners (Task 3.2 verifies the arm64 hosted
pool's /mnt shape separately before enabling). Wired into both
backend_build.yml and image_build.yml between free-disk-space and the
first Docker operation.
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <[email protected]>
* ci(setup-build-disk): chmod 1777 /mnt/docker-tmp
buildx CLI runs as the unprivileged 'runner' user and creates config
dirs under TMPDIR before binding them into the buildkit container.
/mnt is root-owned by default, so the original mkdir produced a
permission-denied when buildx tried to write there:
ERROR: mkdir /mnt/docker-tmp/buildkitd-config2740457204: permission denied
Mirror /tmp's permission mode (1777 — world-writable with sticky bit)
on /mnt/docker-tmp so non-root processes can stage their config.
Caught by the first PR run (image-build hipblas job) on PR #9726.
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <[email protected]>
* ci: weekly full-matrix rebuild via cron
Path-filtering backend.yml master push (the previous commit's main
optimization) skips backends whose source didn't change. That broke
the DEPS_REFRESH cache-buster's coverage: the build-arg keyed on
%Y-W%V busts the install layer's cache on a new ISO week, but only
when the build actually runs. Untouched Python backends (torch,
transformers, vllm with no version pin) would otherwise ship stale
wheels indefinitely.
Add a Sunday 06:00 UTC cron that fires the full matrix. Schedule
events have no event.ref / event.before, so the script's changedFiles
== null fallback (scripts/changed-backends.js) emits the full matrix
automatically — no script change needed.
C++/Go backends with pinned deps cache-hit and complete fast, so the
weekly cost is dominated by Python re-resolves which is exactly what
we want.
workflow_dispatch added so a maintainer can trigger an ad-hoc
full-matrix rebuild without faking a tag push.
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <[email protected]>
---------
Signed-off-by: Ettore Di Giacinto <[email protected]>
Co-authored-by: Ettore Di Giacinto <[email protected]>
2026-05-08 21:43:41 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
(async () => {
|
|
|
|
|
// Tag pushes and an explicit FORCE_ALL escape hatch always rebuild everything.
|
|
|
|
|
// FORCE_ALL is set from backend.yml whenever github.ref starts with refs/tags/.
|
|
|
|
|
const forceAll = process.env.FORCE_ALL === 'true';
|
|
|
|
|
const isTagPush = typeof event.ref === 'string' && event.ref.startsWith('refs/tags/');
|
|
|
|
|
const isBranchPush = !!event.ref && !event.pull_request && !isTagPush;
|
|
|
|
|
|
|
|
|
|
let changedFiles = null;
|
|
|
|
|
if (event.pull_request) {
|
|
|
|
|
changedFiles = await getChangedFilesForPR(event);
|
|
|
|
|
} else if (isBranchPush && !forceAll) {
|
|
|
|
|
changedFiles = await getChangedFilesForPush(event);
|
|
|
|
|
// null -> fall through to the full matrix (e.g. first push, API truncated,
|
|
|
|
|
// network failure).
|
|
|
|
|
}
|
|
|
|
|
// All other event types (workflow_dispatch, schedule, tag pushes, FORCE_ALL)
|
|
|
|
|
// leave changedFiles === null and run everything.
|
|
|
|
|
|
|
|
|
|
if (changedFiles === null) {
|
|
|
|
|
emitFullMatrix();
|
|
|
|
|
return;
|
|
|
|
|
}
|
|
|
|
|
emitFilteredMatrix(changedFiles);
|
2025-08-18 15:56:34 +00:00
|
|
|
})();
|