2025-06-17 20:21:44 +00:00
---
2025-07-18 11:24:12 +00:00
## metas
- &llamacpp
name : "llama-cpp"
alias : "llama-cpp"
license : mit
icon : https://user-images.githubusercontent.com/1991296/230134379-7181e485-c521-4d23-a0d6-f7b3b61ba524.png
description : |
LLM inference in C/C++
urls :
- https://github.com/ggerganov/llama.cpp
tags :
- text-to-text
- LLM
- CPU
- GPU
- Metal
- CUDA
- HIP
capabilities :
default : "cpu-llama-cpp"
nvidia : "cuda12-llama-cpp"
intel : "intel-sycl-f16-llama-cpp"
amd : "rocm-llama-cpp"
metal : "metal-llama-cpp"
2025-07-19 19:58:53 +00:00
vulkan : "vulkan-llama-cpp"
2025-07-18 11:24:12 +00:00
nvidia-l4t : "nvidia-l4t-arm64-llama-cpp"
2025-12-02 13:24:35 +00:00
nvidia-cuda-13 : "cuda13-llama-cpp"
nvidia-cuda-12 : "cuda12-llama-cpp"
nvidia-l4t-cuda-12 : "nvidia-l4t-arm64-llama-cpp"
nvidia-l4t-cuda-13 : "cuda13-nvidia-l4t-arm64-llama-cpp"
2026-04-12 11:51:28 +00:00
- &ikllamacpp
name : "ik-llama-cpp"
alias : "ik-llama-cpp"
license : mit
description : |
Fork of llama.cpp optimized for CPU performance by ikawrakow
urls :
- https://github.com/ikawrakow/ik_llama.cpp
tags :
- text-to-text
- LLM
- CPU
capabilities :
default : "cpu-ik-llama-cpp"
feat(backend): add turboquant llama.cpp-fork backend (#9355)
* feat(backend): add turboquant llama.cpp-fork backend
turboquant is a llama.cpp fork (TheTom/llama-cpp-turboquant, branch
feature/turboquant-kv-cache) that adds a TurboQuant KV-cache scheme.
It ships as a first-class backend reusing backend/cpp/llama-cpp sources
via a thin wrapper Makefile: each variant target copies ../llama-cpp
into a sibling build dir and invokes llama-cpp's build-llama-cpp-grpc-server
with LLAMA_REPO/LLAMA_VERSION overridden to point at the fork. No
duplication of grpc-server.cpp — upstream fixes flow through automatically.
Wires up the full matrix (CPU, CUDA 12/13, L4T, L4T-CUDA13, ROCm, SYCL
f32/f16, Vulkan) in backend.yml and the gallery entries in index.yaml,
adds a tests-turboquant-grpc e2e job driven by BACKEND_TEST_CACHE_TYPE_K/V=q8_0
to exercise the KV-cache config path (backend_test.go gains dedicated env
vars wired into ModelOptions.CacheTypeKey/Value — a generic improvement
usable by any llama.cpp-family backend), and registers a nightly auto-bump
PR in bump_deps.yaml tracking feature/turboquant-kv-cache.
scripts/changed-backends.js gets a special-case so edits to
backend/cpp/llama-cpp/ also retrigger the turboquant CI pipeline, since
the wrapper reuses those sources.
* feat(turboquant): carry upstream patches against fork API drift
turboquant branched from llama.cpp before upstream commit 66060008
("server: respect the ignore eos flag", #21203) which added the
`logit_bias_eog` field to `server_context_meta` and a matching
parameter to `server_task::params_from_json_cmpl`. The shared
backend/cpp/llama-cpp/grpc-server.cpp depends on that field, so
building it against the fork unmodified fails.
Cherry-pick that commit as a patch file under
backend/cpp/turboquant/patches/ and apply it to the cloned fork
sources via a new apply-patches.sh hook called from the wrapper
Makefile. Simplifies the build flow too: instead of hopping through
llama-cpp's build-llama-cpp-grpc-server indirection, the wrapper now
drives the copied Makefile directly (clone -> patch -> build).
Drop the corresponding patch whenever the fork catches up with
upstream — the build fails fast if a patch stops applying, which
is the signal to retire it.
* docs: add turboquant backend section + clarify cache_type_k/v
Document the new turboquant (llama.cpp fork with TurboQuant KV-cache)
backend alongside the existing llama-cpp / ik-llama-cpp sections in
features/text-generation.md: when to pick it, how to install it from
the gallery, and a YAML example showing backend: turboquant together
with cache_type_k / cache_type_v.
Also expand the cache_type_k / cache_type_v table rows in
advanced/model-configuration.md to spell out the accepted llama.cpp
quantization values and note that these fields apply to all
llama.cpp-family backends, not just vLLM.
* feat(turboquant): patch ggml-rpc GGML_OP_COUNT assertion
The fork adds new GGML ops bringing GGML_OP_COUNT to 97, but
ggml/include/ggml-rpc.h static-asserts it equals 96, breaking
the GGML_RPC=ON build paths (turboquant-grpc / turboquant-rpc-server).
Carry a one-line patch that updates the expected count so the
assertion holds. Drop this patch whenever the fork fixes it upstream.
* feat(turboquant): allow turbo* KV-cache types and exercise them in e2e
The shared backend/cpp/llama-cpp/grpc-server.cpp carries its own
allow-list of accepted KV-cache types (kv_cache_types[]) and rejects
anything outside it before the value reaches llama.cpp's parser. That
list only contains the standard llama.cpp types — turbo2/turbo3/turbo4
would throw "Unsupported cache type" at LoadModel time, meaning
nothing the LocalAI gRPC layer accepted was actually fork-specific.
Add a build-time augmentation step (patch-grpc-server.sh, called from
the turboquant wrapper Makefile) that inserts GGML_TYPE_TURBO2_0/3_0/4_0
into the allow-list of the *copied* grpc-server.cpp under
turboquant-<flavor>-build/. The original file under backend/cpp/llama-cpp/
is never touched, so the stock llama-cpp build keeps compiling against
vanilla upstream which has no notion of those enum values.
Switch test-extra-backend-turboquant to set
BACKEND_TEST_CACHE_TYPE_K=turbo3 / _V=turbo3 so the e2e gRPC suite
actually runs the fork's TurboQuant KV-cache code paths (turbo3 also
auto-enables flash_attention in the fork). Picking q8_0 here would
only re-test the standard llama.cpp path that the upstream llama-cpp
backend already covers.
Refresh the docs (text-generation.md + model-configuration.md) to
list turbo2/turbo3/turbo4 explicitly and call out that you only get
the TurboQuant code path with this backend + a turbo* cache type.
* fix(turboquant): rewrite patch-grpc-server.sh in awk, not python3
The builder image (ubuntu:24.04 stage-2 in Dockerfile.turboquant)
does not install python3, so the python-based augmentation step
errored with `python3: command not found` at make time. Switch to
awk, which ships in coreutils and is already available everywhere
the rest of the wrapper Makefile runs.
* Apply suggestion from @mudler
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
---------
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-04-14 23:25:04 +00:00
- &turboquant
name : "turboquant"
alias : "turboquant"
license : mit
description : |
Fork of llama.cpp adding the TurboQuant KV-cache quantization scheme.
Reuses the LocalAI llama.cpp gRPC server sources against the fork's libllama.
urls :
- https://github.com/TheTom/llama-cpp-turboquant
tags :
- text-to-text
- LLM
- CPU
- GPU
- CUDA
- HIP
- turboquant
- kv-cache
capabilities :
default : "cpu-turboquant"
nvidia : "cuda12-turboquant"
intel : "intel-sycl-f16-turboquant"
amd : "rocm-turboquant"
vulkan : "vulkan-turboquant"
nvidia-l4t : "nvidia-l4t-arm64-turboquant"
nvidia-cuda-13 : "cuda13-turboquant"
nvidia-cuda-12 : "cuda12-turboquant"
nvidia-l4t-cuda-12 : "nvidia-l4t-arm64-turboquant"
nvidia-l4t-cuda-13 : "cuda13-nvidia-l4t-arm64-turboquant"
2025-07-20 20:52:45 +00:00
- &whispercpp
name : "whisper"
alias : "whisper"
license : mit
icon : https://user-images.githubusercontent.com/1991296/235238348-05d0f6a4-da44-4900-a1de-d0707e75b763.jpeg
description : |
Port of OpenAI's Whisper model in C/C++
urls :
- https://github.com/ggml-org/whisper.cpp
tags :
- audio-transcription
- CPU
- GPU
- CUDA
- HIP
capabilities :
default : "cpu-whisper"
nvidia : "cuda12-whisper"
intel : "intel-sycl-f16-whisper"
2025-09-01 20:30:35 +00:00
metal : "metal-whisper"
2025-07-20 20:52:45 +00:00
amd : "rocm-whisper"
vulkan : "vulkan-whisper"
nvidia-l4t : "nvidia-l4t-arm64-whisper"
2025-12-02 13:24:35 +00:00
nvidia-cuda-13 : "cuda13-whisper"
nvidia-cuda-12 : "cuda12-whisper"
nvidia-l4t-cuda-12 : "nvidia-l4t-arm64-whisper"
nvidia-l4t-cuda-13 : "cuda13-nvidia-l4t-arm64-whisper"
2026-02-09 08:12:05 +00:00
- &voxtral
name : "voxtral"
alias : "voxtral"
license : mit
description : |
Voxtral Realtime 4B Pure C speech-to-text inference engine
urls :
- https://github.com/mudler/voxtral.c
tags :
- audio-transcription
- CPU
- Metal
capabilities :
default : "cpu-voxtral"
metal-darwin-arm64 : "metal-voxtral"
2025-07-19 19:58:53 +00:00
- &stablediffusionggml
name : "stablediffusion-ggml"
alias : "stablediffusion-ggml"
license : mit
icon : https://github.com/leejet/stable-diffusion.cpp/raw/master/assets/cat_with_sd_cpp_42.png
description : |
Stable Diffusion and Flux in pure C/C++
urls :
- https://github.com/leejet/stable-diffusion.cpp
tags :
- image-generation
- CPU
- GPU
- Metal
- CUDA
- HIP
capabilities :
default : "cpu-stablediffusion-ggml"
nvidia : "cuda12-stablediffusion-ggml"
intel : "intel-sycl-f16-stablediffusion-ggml"
2025-07-22 14:31:04 +00:00
# amd: "rocm-stablediffusion-ggml"
2025-07-19 19:58:53 +00:00
vulkan : "vulkan-stablediffusion-ggml"
nvidia-l4t : "nvidia-l4t-arm64-stablediffusion-ggml"
2025-09-01 20:30:35 +00:00
metal : "metal-stablediffusion-ggml"
2025-12-02 13:24:35 +00:00
nvidia-cuda-13 : "cuda13-stablediffusion-ggml"
nvidia-cuda-12 : "cuda12-stablediffusion-ggml"
nvidia-l4t-cuda-12 : "nvidia-l4t-arm64-stablediffusion-ggml"
nvidia-l4t-cuda-13 : "cuda13-nvidia-l4t-arm64-stablediffusion-ggml"
2025-07-27 20:02:51 +00:00
- &rfdetr
name : "rfdetr"
alias : "rfdetr"
license : apache-2.0
icon : https://avatars.githubusercontent.com/u/53104118?s=200&v=4
description : |
RF-DETR is a real-time, transformer-based object detection model architecture developed by Roboflow and released under the Apache 2.0 license.
RF-DETR is the first real-time model to exceed 60 AP on the Microsoft COCO benchmark alongside competitive performance at base sizes. It also achieves state-of-the-art performance on RF100-VL, an object detection benchmark that measures model domain adaptability to real world problems. RF-DETR is fastest and most accurate for its size when compared current real-time objection models.
RF-DETR is small enough to run on the edge using Inference, making it an ideal model for deployments that need both strong accuracy and real-time performance.
urls :
- https://github.com/roboflow/rf-detr
tags :
- object-detection
- rfdetr
- gpu
- cpu
capabilities :
nvidia : "cuda12-rfdetr"
intel : "intel-rfdetr"
#amd: "rocm-rfdetr"
nvidia-l4t : "nvidia-l4t-arm64-rfdetr"
2026-02-03 20:57:50 +00:00
metal : "metal-rfdetr"
2025-07-27 20:02:51 +00:00
default : "cpu-rfdetr"
2025-12-02 13:24:35 +00:00
nvidia-cuda-13 : "cuda13-rfdetr"
nvidia-cuda-12 : "cuda12-rfdetr"
nvidia-l4t-cuda-12 : "nvidia-l4t-arm64-rfdetr"
2026-04-09 19:49:11 +00:00
- &sam3cpp
name : "sam3-cpp"
alias : "sam3-cpp"
license : mit
description : |
Segment Anything Model (SAM 3/2/EdgeTAM) in C/C++ using GGML.
Supports text-prompted and point/box-prompted image segmentation.
urls :
- https://github.com/PABannier/sam3.cpp
tags :
- image-segmentation
- object-detection
- sam3
- gpu
- cpu
capabilities :
default : "cpu-sam3-cpp"
nvidia : "cuda12-sam3-cpp"
nvidia-cuda-12 : "cuda12-sam3-cpp"
nvidia-cuda-13 : "cuda13-sam3-cpp"
nvidia-l4t : "nvidia-l4t-arm64-sam3-cpp"
nvidia-l4t-cuda-12 : "nvidia-l4t-arm64-sam3-cpp"
nvidia-l4t-cuda-13 : "cuda13-nvidia-l4t-arm64-sam3-cpp"
intel : "intel-sycl-f32-sam3-cpp"
vulkan : "vulkan-sam3-cpp"
2025-06-17 15:31:53 +00:00
- &vllm
2025-07-03 16:01:55 +00:00
name : "vllm"
2025-06-17 15:31:53 +00:00
license : apache-2.0
urls :
2025-06-17 20:21:44 +00:00
- https://github.com/vllm-project/vllm
2025-06-17 15:31:53 +00:00
tags :
- text-to-text
- multimodal
- GPTQ
- AWQ
- AutoRound
- INT4
- INT8
- FP8
icon : https://raw.githubusercontent.com/vllm-project/vllm/main/docs/assets/logos/vllm-logo-text-dark.png
description : |
vLLM is a fast and easy-to-use library for LLM inference and serving.
Originally developed in the Sky Computing Lab at UC Berkeley, vLLM has evolved into a community-driven project with contributions from both academia and industry.
vLLM is fast with :
State-of-the-art serving throughput
Efficient management of attention key and value memory with PagedAttention
Continuous batching of incoming requests
Fast model execution with CUDA/HIP graph
Quantizations : GPTQ, AWQ, AutoRound, INT4, INT8, and FP8
Optimized CUDA kernels, including integration with FlashAttention and FlashInfer
Speculative decoding
Chunked prefill
alias : "vllm"
2025-07-03 16:01:55 +00:00
capabilities :
nvidia : "cuda12-vllm"
amd : "rocm-vllm"
2025-07-28 13:15:19 +00:00
intel : "intel-vllm"
2025-12-02 13:24:35 +00:00
nvidia-cuda-12 : "cuda12-vllm"
feat(vllm): parity with llama.cpp backend (#9328)
* fix(schema): serialize ToolCallID and Reasoning in Messages.ToProto
The ToProto conversion was dropping tool_call_id and reasoning_content
even though both proto and Go fields existed, breaking multi-turn tool
calling and reasoning passthrough to backends.
* refactor(config): introduce backend hook system and migrate llama-cpp defaults
Adds RegisterBackendHook/runBackendHooks so each backend can register
default-filling functions that run during ModelConfig.SetDefaults().
Migrates the existing GGUF guessing logic into hooks_llamacpp.go,
registered for both 'llama-cpp' and the empty backend (auto-detect).
Removes the old guesser.go shim.
* feat(config): add vLLM parser defaults hook and importer auto-detection
Introduces parser_defaults.json mapping model families to vLLM
tool_parser/reasoning_parser names, with longest-pattern-first matching.
The vllmDefaults hook auto-fills tool_parser and reasoning_parser
options at load time for known families, while the VLLMImporter writes
the same values into generated YAML so users can review and edit them.
Adds tests covering MatchParserDefaults, hook registration via
SetDefaults, and the user-override behavior.
* feat(vllm): wire native tool/reasoning parsers + chat deltas + logprobs
- Use vLLM's ToolParserManager/ReasoningParserManager to extract structured
output (tool calls, reasoning content) instead of reimplementing parsing
- Convert proto Messages to dicts and pass tools to apply_chat_template
- Emit ChatDelta with content/reasoning_content/tool_calls in Reply
- Extract prompt_tokens, completion_tokens, and logprobs from output
- Replace boolean GuidedDecoding with proper GuidedDecodingParams from Grammar
- Add TokenizeString and Free RPC methods
- Fix missing `time` import used by load_video()
* feat(vllm): CPU support + shared utils + vllm-omni feature parity
- Split vllm install per acceleration: move generic `vllm` out of
requirements-after.txt into per-profile after files (cublas12, hipblas,
intel) and add CPU wheel URL for cpu-after.txt
- requirements-cpu.txt now pulls torch==2.7.0+cpu from PyTorch CPU index
- backend/index.yaml: register cpu-vllm / cpu-vllm-development variants
- New backend/python/common/vllm_utils.py: shared parse_options,
messages_to_dicts, setup_parsers helpers (used by both vllm backends)
- vllm-omni: replace hardcoded chat template with tokenizer.apply_chat_template,
wire native parsers via shared utils, emit ChatDelta with token counts,
add TokenizeString and Free RPCs, detect CPU and set VLLM_TARGET_DEVICE
- Add test_cpu_inference.py: standalone script to validate CPU build with
a small model (Qwen2.5-0.5B-Instruct)
* fix(vllm): CPU build compatibility with vllm 0.14.1
Validated end-to-end on CPU with Qwen2.5-0.5B-Instruct (LoadModel, Predict,
TokenizeString, Free all working).
- requirements-cpu-after.txt: pin vllm to 0.14.1+cpu (pre-built wheel from
GitHub releases) for x86_64 and aarch64. vllm 0.14.1 is the newest CPU
wheel whose torch dependency resolves against published PyTorch builds
(torch==2.9.1+cpu). Later vllm CPU wheels currently require
torch==2.10.0+cpu which is only available on the PyTorch test channel
with incompatible torchvision.
- requirements-cpu.txt: bump torch to 2.9.1+cpu, add torchvision/torchaudio
so uv resolves them consistently from the PyTorch CPU index.
- install.sh: add --index-strategy=unsafe-best-match for CPU builds so uv
can mix the PyTorch index and PyPI for transitive deps (matches the
existing intel profile behaviour).
- backend.py LoadModel: vllm >= 0.14 removed AsyncLLMEngine.get_model_config
so the old code path errored out with AttributeError on model load.
Switch to the new get_tokenizer()/tokenizer accessor with a fallback
to building the tokenizer directly from request.Model.
* fix(vllm): tool parser constructor compat + e2e tool calling test
Concrete vLLM tool parsers override the abstract base's __init__ and
drop the tools kwarg (e.g. Hermes2ProToolParser only takes tokenizer).
Instantiating with tools= raised TypeError which was silently caught,
leaving chat_deltas.tool_calls empty.
Retry the constructor without the tools kwarg on TypeError — tools
aren't required by these parsers since extract_tool_calls finds tool
syntax in the raw model output directly.
Validated with Qwen/Qwen2.5-0.5B-Instruct + hermes parser on CPU:
the backend correctly returns ToolCallDelta{name='get_weather',
arguments='{"location": "Paris, France"}'} in ChatDelta.
test_tool_calls.py is a standalone smoke test that spawns the gRPC
backend, sends a chat completion with tools, and asserts the response
contains a structured tool call.
* ci(backend): build cpu-vllm container image
Add the cpu-vllm variant to the backend container build matrix so the
image registered in backend/index.yaml (cpu-vllm / cpu-vllm-development)
is actually produced by CI.
Follows the same pattern as the other CPU python backends
(cpu-diffusers, cpu-chatterbox, etc.) with build-type='' and no CUDA.
backend_pr.yml auto-picks this up via its matrix filter from backend.yml.
* test(e2e-backends): add tools capability + HF model name support
Extends tests/e2e-backends to cover backends that:
- Resolve HuggingFace model ids natively (vllm, vllm-omni) instead of
loading a local file: BACKEND_TEST_MODEL_NAME is passed verbatim as
ModelOptions.Model with no download/ModelFile.
- Parse tool calls into ChatDelta.tool_calls: new "tools" capability
sends a Predict with a get_weather function definition and asserts
the Reply contains a matching ToolCallDelta. Uses UseTokenizerTemplate
with OpenAI-style Messages so the backend can wire tools into the
model's chat template.
- Need backend-specific Options[]: BACKEND_TEST_OPTIONS lets a test set
e.g. "tool_parser:hermes,reasoning_parser:qwen3" at LoadModel time.
Adds make target test-extra-backend-vllm that:
- docker-build-vllm
- loads Qwen/Qwen2.5-0.5B-Instruct
- runs health,load,predict,stream,tools with tool_parser:hermes
Drops backend/python/vllm/test_{cpu_inference,tool_calls}.py — those
standalone scripts were scaffolding used while bringing up the Python
backend; the e2e-backends harness now covers the same ground uniformly
alongside llama-cpp and ik-llama-cpp.
* ci(test-extra): run vllm e2e tests on CPU
Adds tests-vllm-grpc to the test-extra workflow, mirroring the
llama-cpp and ik-llama-cpp gRPC jobs. Triggers when files under
backend/python/vllm/ change (or on run-all), builds the local-ai
vllm container image, and runs the tests/e2e-backends harness with
BACKEND_TEST_MODEL_NAME=Qwen/Qwen2.5-0.5B-Instruct, tool_parser:hermes,
and the tools capability enabled.
Uses ubuntu-latest (no GPU) — vllm runs on CPU via the cpu-vllm
wheel we pinned in requirements-cpu-after.txt. Frees disk space
before the build since the docker image + torch + vllm wheel is
sizeable.
* fix(vllm): build from source on CI to avoid SIGILL on prebuilt wheel
The prebuilt vllm 0.14.1+cpu wheel from GitHub releases is compiled with
SIMD instructions (AVX-512 VNNI/BF16 or AMX-BF16) that not every CPU
supports. GitHub Actions ubuntu-latest runners SIGILL when vllm spawns
the model_executor.models.registry subprocess for introspection, so
LoadModel never reaches the actual inference path.
- install.sh: when FROM_SOURCE=true on a CPU build, temporarily hide
requirements-cpu-after.txt so installRequirements installs the base
deps + torch CPU without pulling the prebuilt wheel, then clone vllm
and compile it with VLLM_TARGET_DEVICE=cpu. The resulting binaries
target the host's actual CPU.
- backend/Dockerfile.python: accept a FROM_SOURCE build-arg and expose
it as an ENV so install.sh sees it during `make`.
- Makefile docker-build-backend: forward FROM_SOURCE as --build-arg
when set, so backends that need source builds can opt in.
- Makefile test-extra-backend-vllm: call docker-build-vllm via a
recursive $(MAKE) invocation so FROM_SOURCE flows through.
- .github/workflows/test-extra.yml: set FROM_SOURCE=true on the
tests-vllm-grpc job. Slower but reliable — the prebuilt wheel only
works on hosts that share the build-time SIMD baseline.
Answers 'did you test locally?': yes, end-to-end on my local machine
with the prebuilt wheel (CPU supports AVX-512 VNNI). The CI runner CPU
gap was not covered locally — this commit plugs that gap.
* ci(vllm): use bigger-runner instead of source build
The prebuilt vllm 0.14.1+cpu wheel requires SIMD instructions (AVX-512
VNNI/BF16) that stock ubuntu-latest GitHub runners don't support —
vllm.model_executor.models.registry SIGILLs on import during LoadModel.
Source compilation works but takes 30-40 minutes per CI run, which is
too slow for an e2e smoke test. Instead, switch tests-vllm-grpc to the
bigger-runner self-hosted label (already used by backend.yml for the
llama-cpp CUDA build) — that hardware has the required SIMD baseline
and the prebuilt wheel runs cleanly.
FROM_SOURCE=true is kept as an opt-in escape hatch:
- install.sh still has the CPU source-build path for hosts that need it
- backend/Dockerfile.python still declares the ARG + ENV
- Makefile docker-build-backend still forwards the build-arg when set
Default CI path uses the fast prebuilt wheel; source build can be
re-enabled by exporting FROM_SOURCE=true in the environment.
* ci(vllm): install make + build deps on bigger-runner
bigger-runner is a bare self-hosted runner used by backend.yml for
docker image builds — it has docker but not the usual ubuntu-latest
toolchain. The make-based test target needs make, build-essential
(cgo in 'go test'), and curl/unzip (the Makefile protoc target
downloads protoc from github releases).
protoc-gen-go and protoc-gen-go-grpc come via 'go install' in the
install-go-tools target, which setup-go makes possible.
* ci(vllm): install libnuma1 + libgomp1 on bigger-runner
The vllm 0.14.1+cpu wheel ships a _C C++ extension that dlopens
libnuma.so.1 at import time. When the runner host doesn't have it,
the extension silently fails to register its torch ops, so
EngineCore crashes on init_device with:
AttributeError: '_OpNamespace' '_C_utils' object has no attribute
'init_cpu_threads_env'
Also add libgomp1 (OpenMP runtime, used by torch CPU kernels) to be
safe on stripped-down runners.
* feat(vllm): bundle libnuma/libgomp via package.sh
The vllm CPU wheel ships a _C extension that dlopens libnuma.so.1 at
import time; torch's CPU kernels in turn use libgomp.so.1 (OpenMP).
Without these on the host, vllm._C silently fails to register its
torch ops and EngineCore crashes with:
AttributeError: '_OpNamespace' '_C_utils' object has no attribute
'init_cpu_threads_env'
Rather than asking every user to install libnuma1/libgomp1 on their
host (or every LocalAI base image to ship them), bundle them into
the backend image itself — same pattern fish-speech and the GPU libs
already use. libbackend.sh adds ${EDIR}/lib to LD_LIBRARY_PATH at
run time so the bundled copies are picked up automatically.
- backend/python/vllm/package.sh (new): copies libnuma.so.1 and
libgomp.so.1 from the builder's multilib paths into ${BACKEND}/lib,
preserving soname symlinks. Runs during Dockerfile.python's
'Run backend-specific packaging' step (which already invokes
package.sh if present).
- backend/Dockerfile.python: install libnuma1 + libgomp1 in the
builder stage so package.sh has something to copy (the Ubuntu
base image otherwise only has libgomp in the gcc dep chain).
- test-extra.yml: drop the workaround that installed these libs on
the runner host — with the backend image self-contained, the
runner no longer needs them, and the test now exercises the
packaging path end-to-end the way a production host would.
* ci(vllm): disable tests-vllm-grpc job (heterogeneous runners)
Both ubuntu-latest and bigger-runner have inconsistent CPU baselines:
some instances support the AVX-512 VNNI/BF16 instructions the prebuilt
vllm 0.14.1+cpu wheel was compiled with, others SIGILL on import of
vllm.model_executor.models.registry. The libnuma packaging fix doesn't
help when the wheel itself can't be loaded.
FROM_SOURCE=true compiles vllm against the actual host CPU and works
everywhere, but takes 30-50 minutes per run — too slow for a smoke
test on every PR.
Comment out the job for now. The test itself is intact and passes
locally; run it via 'make test-extra-backend-vllm' on a host with the
required SIMD baseline. Re-enable when:
- we have a self-hosted runner label with guaranteed AVX-512 VNNI/BF16, or
- vllm publishes a CPU wheel with a wider baseline, or
- we set up a docker layer cache that makes FROM_SOURCE acceptable
The detect-changes vllm output, the test harness changes (tests/
e2e-backends + tools cap), the make target (test-extra-backend-vllm),
the package.sh and the Dockerfile/install.sh plumbing all stay in
place.
2026-04-13 09:00:29 +00:00
cpu : "cpu-vllm"
2026-01-24 21:23:30 +00:00
- &vllm-omni
name : "vllm-omni"
license : apache-2.0
urls :
- https://github.com/vllm-project/vllm-omni
tags :
- text-to-image
- image-generation
- text-to-video
- video-generation
- text-to-speech
- TTS
- multimodal
- LLM
icon : https://raw.githubusercontent.com/vllm-project/vllm/main/docs/assets/logos/vllm-logo-text-dark.png
description : |
vLLM-Omni is a unified interface for multimodal generation with vLLM.
It supports image generation (text-to-image, image editing), video generation
(text-to-video, image-to-video), text generation with multimodal inputs, and
text-to-speech generation. Only supports NVIDIA (CUDA) and ROCm platforms.
alias : "vllm-omni"
capabilities :
nvidia : "cuda12-vllm-omni"
amd : "rocm-vllm-omni"
nvidia-cuda-12 : "cuda12-vllm-omni"
2025-08-22 06:42:29 +00:00
- &mlx
name : "mlx"
uri : "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-mlx"
icon : https://avatars.githubusercontent.com/u/102832242?s=200&v=4
urls :
- https://github.com/ml-explore/mlx-lm
mirrors :
- localai/localai-backends:latest-metal-darwin-arm64-mlx
license : MIT
description : |
Run LLMs with MLX
tags :
- text-to-text
- LLM
- MLX
2026-02-03 22:53:34 +00:00
capabilities :
default : "cpu-mlx"
nvidia : "cuda12-mlx"
metal : "metal-mlx"
nvidia-cuda-12 : "cuda12-mlx"
nvidia-cuda-13 : "cuda13-mlx"
nvidia-l4t : "nvidia-l4t-mlx"
nvidia-l4t-cuda-12 : "nvidia-l4t-mlx"
nvidia-l4t-cuda-13 : "cuda13-nvidia-l4t-arm64-mlx"
2025-08-23 21:05:30 +00:00
- &mlx-vlm
name : "mlx-vlm"
uri : "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-mlx-vlm"
icon : https://avatars.githubusercontent.com/u/102832242?s=200&v=4
urls :
2025-08-24 18:09:19 +00:00
- https://github.com/Blaizzy/mlx-vlm
2025-08-23 21:05:30 +00:00
mirrors :
- localai/localai-backends:latest-metal-darwin-arm64-mlx-vlm
license : MIT
description : |
Run Vision-Language Models with MLX
tags :
- text-to-text
- multimodal
- vision-language
- LLM
- MLX
2026-02-03 22:53:34 +00:00
capabilities :
default : "cpu-mlx-vlm"
nvidia : "cuda12-mlx-vlm"
metal : "metal-mlx-vlm"
nvidia-cuda-12 : "cuda12-mlx-vlm"
nvidia-cuda-13 : "cuda13-mlx-vlm"
nvidia-l4t : "nvidia-l4t-mlx-vlm"
nvidia-l4t-cuda-12 : "nvidia-l4t-mlx-vlm"
nvidia-l4t-cuda-13 : "cuda13-nvidia-l4t-arm64-mlx-vlm"
2025-08-26 13:27:06 +00:00
- &mlx-audio
name : "mlx-audio"
uri : "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-mlx-audio"
icon : https://avatars.githubusercontent.com/u/102832242?s=200&v=4
urls :
- https://github.com/Blaizzy/mlx-audio
mirrors :
- localai/localai-backends:latest-metal-darwin-arm64-mlx-audio
license : MIT
description : |
Run Audio Models with MLX
tags :
- audio-to-text
- audio-generation
- text-to-audio
- LLM
- MLX
2026-02-03 22:53:34 +00:00
capabilities :
default : "cpu-mlx-audio"
nvidia : "cuda12-mlx-audio"
metal : "metal-mlx-audio"
nvidia-cuda-12 : "cuda12-mlx-audio"
nvidia-cuda-13 : "cuda13-mlx-audio"
nvidia-l4t : "nvidia-l4t-mlx-audio"
nvidia-l4t-cuda-12 : "nvidia-l4t-mlx-audio"
nvidia-l4t-cuda-13 : "cuda13-nvidia-l4t-arm64-mlx-audio"
2026-03-09 16:29:32 +00:00
- &mlx-distributed
name : "mlx-distributed"
uri : "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-mlx-distributed"
icon : https://avatars.githubusercontent.com/u/102832242?s=200&v=4
urls :
- https://github.com/ml-explore/mlx-lm
mirrors :
- localai/localai-backends:latest-metal-darwin-arm64-mlx-distributed
license : MIT
description : |
Run distributed LLM inference with MLX across multiple Apple Silicon Macs
tags :
- text-to-text
- LLM
- MLX
- distributed
capabilities :
default : "cpu-mlx-distributed"
nvidia : "cuda12-mlx-distributed"
metal : "metal-mlx-distributed"
nvidia-cuda-12 : "cuda12-mlx-distributed"
nvidia-cuda-13 : "cuda13-mlx-distributed"
nvidia-l4t : "nvidia-l4t-mlx-distributed"
nvidia-l4t-cuda-12 : "nvidia-l4t-mlx-distributed"
nvidia-l4t-cuda-13 : "cuda13-nvidia-l4t-arm64-mlx-distributed"
2025-07-18 11:24:12 +00:00
- &rerankers
name : "rerankers"
alias : "rerankers"
capabilities :
nvidia : "cuda12-rerankers"
2025-07-28 13:15:19 +00:00
intel : "intel-rerankers"
2025-07-18 11:24:12 +00:00
amd : "rocm-rerankers"
2026-02-03 20:57:50 +00:00
metal : "metal-rerankers"
2025-07-18 11:24:12 +00:00
- &transformers
name : "transformers"
2026-01-28 08:09:00 +00:00
icon : https://avatars.githubusercontent.com/u/25720743?s=200&v=4
2025-07-18 11:24:12 +00:00
alias : "transformers"
license : apache-2.0
description : |
Transformers acts as the model-definition framework for state-of-the-art machine learning models in text, computer vision, audio, video, and multimodal model, for both inference and training.
It centralizes the model definition so that this definition is agreed upon across the ecosystem. transformers is the pivot across frameworks : if a model definition is supported, it will be compatible with the majority of training frameworks (Axolotl, Unsloth, DeepSpeed, FSDP, PyTorch-Lightning, ...), inference engines (vLLM, SGLang, TGI, ...), and adjacent modeling libraries (llama.cpp, mlx, ...) which leverage the model definition from transformers.
urls :
- https://github.com/huggingface/transformers
tags :
- text-to-text
- multimodal
capabilities :
nvidia : "cuda12-transformers"
2025-07-28 13:15:19 +00:00
intel : "intel-transformers"
2025-07-18 11:24:12 +00:00
amd : "rocm-transformers"
2026-02-03 20:57:50 +00:00
metal : "metal-transformers"
2025-12-02 13:24:35 +00:00
nvidia-cuda-13 : "cuda13-transformers"
nvidia-cuda-12 : "cuda12-transformers"
2025-07-18 11:24:12 +00:00
- &diffusers
2025-07-23 07:20:59 +00:00
name : "diffusers"
2025-07-18 11:24:12 +00:00
icon : https://raw.githubusercontent.com/huggingface/diffusers/main/docs/source/en/imgs/diffusers_library.jpg
description : |
🤗 Diffusers is the go-to library for state-of-the-art pretrained diffusion models for generating images, audio, and even 3D structures of molecules. Whether you're looking for a simple inference solution or training your own diffusion models, 🤗 Diffusers is a modular toolbox that supports both.
urls :
- https://github.com/huggingface/diffusers
tags :
- image-generation
- video-generation
- diffusion-models
license : apache-2.0
alias : "diffusers"
capabilities :
nvidia : "cuda12-diffusers"
2025-07-28 13:15:19 +00:00
intel : "intel-diffusers"
2025-07-18 11:24:12 +00:00
amd : "rocm-diffusers"
2025-08-08 20:48:38 +00:00
nvidia-l4t : "nvidia-l4t-diffusers"
2025-08-22 21:14:54 +00:00
metal : "metal-diffusers"
2025-08-24 08:17:10 +00:00
default : "cpu-diffusers"
2025-12-02 13:24:35 +00:00
nvidia-cuda-13 : "cuda13-diffusers"
nvidia-cuda-12 : "cuda12-diffusers"
nvidia-l4t-cuda-12 : "nvidia-l4t-arm64-diffusers"
nvidia-l4t-cuda-13 : "cuda13-nvidia-l4t-arm64-diffusers"
2026-02-05 11:04:53 +00:00
- &ace-step
name : "ace-step"
description : |
ACE-Step 1.5 is an open-source music generation model. It supports simple mode (natural language description) and advanced mode (caption, lyrics, think, bpm, keyscale, etc.). Uses in-process acestep (LLMHandler for metadata, DiT for audio).
urls :
- https://github.com/ace-step/ACE-Step-1.5
tags :
- music-generation
- sound-generation
alias : "ace-step"
capabilities :
nvidia : "cuda12-ace-step"
intel : "intel-ace-step"
amd : "rocm-ace-step"
metal : "metal-ace-step"
default : "cpu-ace-step"
nvidia-cuda-13 : "cuda13-ace-step"
nvidia-cuda-12 : "cuda12-ace-step"
- !!merge << : *ace-step
name : "ace-step-development"
capabilities :
nvidia : "cuda12-ace-step-development"
intel : "intel-ace-step-development"
amd : "rocm-ace-step-development"
metal : "metal-ace-step-development"
default : "cpu-ace-step-development"
nvidia-cuda-13 : "cuda13-ace-step-development"
nvidia-cuda-12 : "cuda12-ace-step-development"
2026-03-12 17:56:26 +00:00
- &acestepcpp
name : "acestep-cpp"
description : |
ACE-Step 1.5 C++ backend using GGML. Native C++ implementation of ACE-Step music generation with GPU support through GGML backends.
Generates stereo 48kHz audio from text descriptions and optional lyrics via a two-stage pipeline : text-to-code (ace-qwen3 LLM) + code-to-audio (DiT-VAE).
urls :
- https://github.com/ace-step/acestep.cpp
tags :
- music-generation
- sound-generation
alias : "acestep-cpp"
capabilities :
default : "cpu-acestep-cpp"
nvidia : "cuda12-acestep-cpp"
nvidia-cuda-13 : "cuda13-acestep-cpp"
nvidia-cuda-12 : "cuda12-acestep-cpp"
intel : "intel-sycl-f16-acestep-cpp"
metal : "metal-acestep-cpp"
amd : "rocm-acestep-cpp"
vulkan : "vulkan-acestep-cpp"
nvidia-l4t : "nvidia-l4t-arm64-acestep-cpp"
nvidia-l4t-cuda-12 : "nvidia-l4t-arm64-acestep-cpp"
nvidia-l4t-cuda-13 : "cuda13-nvidia-l4t-arm64-acestep-cpp"
2026-04-11 21:14:26 +00:00
- &qwen3ttscpp
name : "qwen3-tts-cpp"
description : |
Qwen3-TTS C++ backend using GGML. Native C++ text-to-speech with voice cloning support.
Generates 24kHz mono audio from text with optional reference audio for voice cloning via ECAPA-TDNN speaker embeddings.
urls :
- https://github.com/predict-woo/qwen3-tts.cpp
tags :
- text-to-speech
- tts
- voice-cloning
alias : "qwen3-tts-cpp"
capabilities :
default : "cpu-qwen3-tts-cpp"
nvidia : "cuda12-qwen3-tts-cpp"
nvidia-cuda-13 : "cuda13-qwen3-tts-cpp"
nvidia-cuda-12 : "cuda12-qwen3-tts-cpp"
intel : "intel-sycl-f16-qwen3-tts-cpp"
metal : "metal-qwen3-tts-cpp"
amd : "rocm-qwen3-tts-cpp"
vulkan : "vulkan-qwen3-tts-cpp"
nvidia-l4t : "nvidia-l4t-arm64-qwen3-tts-cpp"
nvidia-l4t-cuda-12 : "nvidia-l4t-arm64-qwen3-tts-cpp"
nvidia-l4t-cuda-13 : "cuda13-nvidia-l4t-arm64-qwen3-tts-cpp"
2025-07-18 11:24:12 +00:00
- &faster-whisper
icon : https://avatars.githubusercontent.com/u/1520500?s=200&v=4
description : |
faster-whisper is a reimplementation of OpenAI's Whisper model using CTranslate2, which is a fast inference engine for Transformer models.
This implementation is up to 4 times faster than openai/whisper for the same accuracy while using less memory. The efficiency can be further improved with 8-bit quantization on both CPU and GPU.
urls :
- https://github.com/SYSTRAN/faster-whisper
tags :
- speech-to-text
- Whisper
license : MIT
name : "faster-whisper"
capabilities :
2026-04-08 19:23:38 +00:00
default : "cpu-faster-whisper"
2025-07-18 11:24:12 +00:00
nvidia : "cuda12-faster-whisper"
2025-07-28 13:15:19 +00:00
intel : "intel-faster-whisper"
2025-07-18 11:24:12 +00:00
amd : "rocm-faster-whisper"
2026-02-03 20:57:50 +00:00
metal : "metal-faster-whisper"
2025-12-02 13:24:35 +00:00
nvidia-cuda-13 : "cuda13-faster-whisper"
nvidia-cuda-12 : "cuda12-faster-whisper"
2026-04-08 19:23:38 +00:00
nvidia-l4t : "nvidia-l4t-arm64-faster-whisper"
nvidia-l4t-cuda-12 : "nvidia-l4t-arm64-faster-whisper"
2026-01-07 20:44:35 +00:00
- &moonshine
description : |
Moonshine is a fast, accurate, and efficient speech-to-text transcription model using ONNX Runtime.
It provides real-time transcription capabilities with support for multiple model sizes and GPU acceleration.
urls :
- https://github.com/moonshine-ai/moonshine
tags :
- speech-to-text
- transcription
- ONNX
license : MIT
name : "moonshine"
alias : "moonshine"
capabilities :
nvidia : "cuda12-moonshine"
2026-02-03 20:57:50 +00:00
metal : "metal-moonshine"
2026-01-07 20:44:35 +00:00
default : "cpu-moonshine"
nvidia-cuda-13 : "cuda13-moonshine"
nvidia-cuda-12 : "cuda12-moonshine"
feat(whisperx): add whisperx backend for transcription with speaker diarization (#8299)
* feat(proto): add speaker field to TranscriptSegment for diarization
Add speaker field to the gRPC TranscriptSegment message and map it
through the Go schema, enabling backends to return speaker labels.
Signed-off-by: eureka928 <meobius123@gmail.com>
* feat(whisperx): add whisperx backend for transcription with diarization
Add Python gRPC backend using WhisperX for speech-to-text with
word-level timestamps, forced alignment, and speaker diarization
via pyannote-audio when HF_TOKEN is provided.
Signed-off-by: eureka928 <meobius123@gmail.com>
* feat(whisperx): register whisperx backend in Makefile
Signed-off-by: eureka928 <meobius123@gmail.com>
* feat(whisperx): add whisperx meta and image entries to index.yaml
Signed-off-by: eureka928 <meobius123@gmail.com>
* ci(whisperx): add build matrix entries for CPU, CUDA 12/13, and ROCm
Signed-off-by: eureka928 <meobius123@gmail.com>
* fix(whisperx): unpin torch versions and use CPU index for cpu requirements
Address review feedback:
- Use --extra-index-url for CPU torch wheels to reduce size
- Remove torch version pins, let uv resolve compatible versions
Signed-off-by: eureka928 <meobius123@gmail.com>
* fix(whisperx): pin torch ROCm variant to fix CI build failure
Signed-off-by: eureka928 <meobius123@gmail.com>
* fix(whisperx): pin torch CPU variant to fix uv resolution failure
Pin torch==2.8.0+cpu so uv resolves the CPU wheel from the extra
index instead of picking torch==2.8.0+cu128 from PyPI, which pulls
unresolvable CUDA dependencies.
Signed-off-by: eureka928 <meobius123@gmail.com>
* fix(whisperx): use unsafe-best-match index strategy to fix uv resolution failure
uv's default first-match strategy finds torch on PyPI before checking
the extra index, causing it to pick torch==2.8.0+cu128 instead of the
CPU variant. This makes whisperx's transitive torch dependency
unresolvable. Using unsafe-best-match lets uv consider all indexes.
Signed-off-by: eureka928 <meobius123@gmail.com>
* fix(whisperx): drop +cpu local version suffix to fix uv resolution failure
PEP 440 ==2.8.0 matches 2.8.0+cpu from the extra index, avoiding the
issue where uv cannot locate an explicit +cpu local version specifier.
This aligns with the pattern used by all other CPU backends.
Signed-off-by: eureka928 <meobius123@gmail.com>
* fix(backends): drop +rocm local version suffixes from hipblas requirements to fix uv resolution
uv cannot resolve PEP 440 local version specifiers (e.g. +rocm6.4,
+rocm6.3) in pinned requirements. The --extra-index-url already points
to the correct ROCm wheel index and --index-strategy unsafe-best-match
(set in libbackend.sh) ensures the ROCm variant is preferred.
Applies the same fix as 7f5d72e8 (which resolved this for +cpu) across
all 14 hipblas requirements files.
Signed-off-by: eureka928 <meobius123@gmail.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Signed-off-by: eureka928 <meobius123@gmail.com>
* revert: scope hipblas suffix fix to whisperx only
Reverts changes to non-whisperx hipblas requirements files per
maintainer review — other backends are building fine with the +rocm
local version suffix.
Signed-off-by: eureka928 <meobius123@gmail.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Signed-off-by: eureka928 <meobius123@gmail.com>
---------
Signed-off-by: eureka928 <meobius123@gmail.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-02 15:33:12 +00:00
- &whisperx
description : |
WhisperX provides fast automatic speech recognition with word-level timestamps, speaker diarization,
and forced alignment. Built on faster-whisper and pyannote-audio for high-accuracy transcription
with speaker identification.
urls :
- https://github.com/m-bain/whisperX
tags :
- speech-to-text
- diarization
- whisperx
license : BSD-4-Clause
name : "whisperx"
2026-04-08 14:40:08 +00:00
alias : "whisperx"
feat(whisperx): add whisperx backend for transcription with speaker diarization (#8299)
* feat(proto): add speaker field to TranscriptSegment for diarization
Add speaker field to the gRPC TranscriptSegment message and map it
through the Go schema, enabling backends to return speaker labels.
Signed-off-by: eureka928 <meobius123@gmail.com>
* feat(whisperx): add whisperx backend for transcription with diarization
Add Python gRPC backend using WhisperX for speech-to-text with
word-level timestamps, forced alignment, and speaker diarization
via pyannote-audio when HF_TOKEN is provided.
Signed-off-by: eureka928 <meobius123@gmail.com>
* feat(whisperx): register whisperx backend in Makefile
Signed-off-by: eureka928 <meobius123@gmail.com>
* feat(whisperx): add whisperx meta and image entries to index.yaml
Signed-off-by: eureka928 <meobius123@gmail.com>
* ci(whisperx): add build matrix entries for CPU, CUDA 12/13, and ROCm
Signed-off-by: eureka928 <meobius123@gmail.com>
* fix(whisperx): unpin torch versions and use CPU index for cpu requirements
Address review feedback:
- Use --extra-index-url for CPU torch wheels to reduce size
- Remove torch version pins, let uv resolve compatible versions
Signed-off-by: eureka928 <meobius123@gmail.com>
* fix(whisperx): pin torch ROCm variant to fix CI build failure
Signed-off-by: eureka928 <meobius123@gmail.com>
* fix(whisperx): pin torch CPU variant to fix uv resolution failure
Pin torch==2.8.0+cpu so uv resolves the CPU wheel from the extra
index instead of picking torch==2.8.0+cu128 from PyPI, which pulls
unresolvable CUDA dependencies.
Signed-off-by: eureka928 <meobius123@gmail.com>
* fix(whisperx): use unsafe-best-match index strategy to fix uv resolution failure
uv's default first-match strategy finds torch on PyPI before checking
the extra index, causing it to pick torch==2.8.0+cu128 instead of the
CPU variant. This makes whisperx's transitive torch dependency
unresolvable. Using unsafe-best-match lets uv consider all indexes.
Signed-off-by: eureka928 <meobius123@gmail.com>
* fix(whisperx): drop +cpu local version suffix to fix uv resolution failure
PEP 440 ==2.8.0 matches 2.8.0+cpu from the extra index, avoiding the
issue where uv cannot locate an explicit +cpu local version specifier.
This aligns with the pattern used by all other CPU backends.
Signed-off-by: eureka928 <meobius123@gmail.com>
* fix(backends): drop +rocm local version suffixes from hipblas requirements to fix uv resolution
uv cannot resolve PEP 440 local version specifiers (e.g. +rocm6.4,
+rocm6.3) in pinned requirements. The --extra-index-url already points
to the correct ROCm wheel index and --index-strategy unsafe-best-match
(set in libbackend.sh) ensures the ROCm variant is preferred.
Applies the same fix as 7f5d72e8 (which resolved this for +cpu) across
all 14 hipblas requirements files.
Signed-off-by: eureka928 <meobius123@gmail.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Signed-off-by: eureka928 <meobius123@gmail.com>
* revert: scope hipblas suffix fix to whisperx only
Reverts changes to non-whisperx hipblas requirements files per
maintainer review — other backends are building fine with the +rocm
local version suffix.
Signed-off-by: eureka928 <meobius123@gmail.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Signed-off-by: eureka928 <meobius123@gmail.com>
---------
Signed-off-by: eureka928 <meobius123@gmail.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-02 15:33:12 +00:00
capabilities :
nvidia : "cuda12-whisperx"
amd : "rocm-whisperx"
2026-02-03 20:57:50 +00:00
metal : "metal-whisperx"
feat(whisperx): add whisperx backend for transcription with speaker diarization (#8299)
* feat(proto): add speaker field to TranscriptSegment for diarization
Add speaker field to the gRPC TranscriptSegment message and map it
through the Go schema, enabling backends to return speaker labels.
Signed-off-by: eureka928 <meobius123@gmail.com>
* feat(whisperx): add whisperx backend for transcription with diarization
Add Python gRPC backend using WhisperX for speech-to-text with
word-level timestamps, forced alignment, and speaker diarization
via pyannote-audio when HF_TOKEN is provided.
Signed-off-by: eureka928 <meobius123@gmail.com>
* feat(whisperx): register whisperx backend in Makefile
Signed-off-by: eureka928 <meobius123@gmail.com>
* feat(whisperx): add whisperx meta and image entries to index.yaml
Signed-off-by: eureka928 <meobius123@gmail.com>
* ci(whisperx): add build matrix entries for CPU, CUDA 12/13, and ROCm
Signed-off-by: eureka928 <meobius123@gmail.com>
* fix(whisperx): unpin torch versions and use CPU index for cpu requirements
Address review feedback:
- Use --extra-index-url for CPU torch wheels to reduce size
- Remove torch version pins, let uv resolve compatible versions
Signed-off-by: eureka928 <meobius123@gmail.com>
* fix(whisperx): pin torch ROCm variant to fix CI build failure
Signed-off-by: eureka928 <meobius123@gmail.com>
* fix(whisperx): pin torch CPU variant to fix uv resolution failure
Pin torch==2.8.0+cpu so uv resolves the CPU wheel from the extra
index instead of picking torch==2.8.0+cu128 from PyPI, which pulls
unresolvable CUDA dependencies.
Signed-off-by: eureka928 <meobius123@gmail.com>
* fix(whisperx): use unsafe-best-match index strategy to fix uv resolution failure
uv's default first-match strategy finds torch on PyPI before checking
the extra index, causing it to pick torch==2.8.0+cu128 instead of the
CPU variant. This makes whisperx's transitive torch dependency
unresolvable. Using unsafe-best-match lets uv consider all indexes.
Signed-off-by: eureka928 <meobius123@gmail.com>
* fix(whisperx): drop +cpu local version suffix to fix uv resolution failure
PEP 440 ==2.8.0 matches 2.8.0+cpu from the extra index, avoiding the
issue where uv cannot locate an explicit +cpu local version specifier.
This aligns with the pattern used by all other CPU backends.
Signed-off-by: eureka928 <meobius123@gmail.com>
* fix(backends): drop +rocm local version suffixes from hipblas requirements to fix uv resolution
uv cannot resolve PEP 440 local version specifiers (e.g. +rocm6.4,
+rocm6.3) in pinned requirements. The --extra-index-url already points
to the correct ROCm wheel index and --index-strategy unsafe-best-match
(set in libbackend.sh) ensures the ROCm variant is preferred.
Applies the same fix as 7f5d72e8 (which resolved this for +cpu) across
all 14 hipblas requirements files.
Signed-off-by: eureka928 <meobius123@gmail.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Signed-off-by: eureka928 <meobius123@gmail.com>
* revert: scope hipblas suffix fix to whisperx only
Reverts changes to non-whisperx hipblas requirements files per
maintainer review — other backends are building fine with the +rocm
local version suffix.
Signed-off-by: eureka928 <meobius123@gmail.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Signed-off-by: eureka928 <meobius123@gmail.com>
---------
Signed-off-by: eureka928 <meobius123@gmail.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-02 15:33:12 +00:00
default : "cpu-whisperx"
nvidia-cuda-13 : "cuda13-whisperx"
nvidia-cuda-12 : "cuda12-whisperx"
2026-04-08 19:23:38 +00:00
nvidia-l4t : "nvidia-l4t-arm64-whisperx"
nvidia-l4t-cuda-12 : "nvidia-l4t-arm64-whisperx"
2025-07-18 11:24:12 +00:00
- &kokoro
icon : https://avatars.githubusercontent.com/u/166769057?v=4
description : |
Kokoro is an open-weight TTS model with 82 million parameters. Despite its lightweight architecture, it delivers comparable quality to larger models while being significantly faster and more cost-efficient. With Apache-licensed weights, Kokoro can be deployed anywhere from production environments to personal projects.
urls :
- https://huggingface.co/hexgrad/Kokoro-82M
- https://github.com/hexgrad/kokoro
tags :
- text-to-speech
- TTS
- LLM
license : apache-2.0
alias : "kokoro"
name : "kokoro"
capabilities :
nvidia : "cuda12-kokoro"
2025-07-28 13:15:19 +00:00
intel : "intel-kokoro"
2025-07-18 11:24:12 +00:00
amd : "rocm-kokoro"
2025-09-22 08:33:26 +00:00
nvidia-l4t : "nvidia-l4t-kokoro"
2026-02-03 20:57:50 +00:00
metal : "metal-kokoro"
2025-12-02 13:24:35 +00:00
nvidia-cuda-13 : "cuda13-kokoro"
nvidia-cuda-12 : "cuda12-kokoro"
nvidia-l4t-cuda-12 : "nvidia-l4t-arm64-kokoro"
2026-04-08 17:23:16 +00:00
- &kokoros
icon : https://avatars.githubusercontent.com/u/166769057?v=4
description : |
Kokoros is a pure Rust TTS backend using the Kokoro ONNX model (82M parameters).
It provides fast, high-quality text-to-speech with streaming support, built on
ONNX Runtime for efficient CPU inference. Supports English, Japanese, Mandarin
Chinese, and German.
urls :
- https://huggingface.co/hexgrad/Kokoro-82M
- https://github.com/lucasjinreal/Kokoros
tags :
- text-to-speech
- TTS
- Rust
- ONNX
license : apache-2.0
alias : "kokoros"
name : "kokoros"
capabilities :
default : "cpu-kokoros"
2025-07-18 11:24:12 +00:00
- &coqui
urls :
- https://github.com/idiap/coqui-ai-TTS
description : |
🐸 Coqui TTS is a library for advanced Text-to-Speech generation.
🚀 Pretrained models in +1100 languages.
🛠️ Tools for training new models and fine-tuning existing models in any language.
📚 Utilities for dataset analysis and curation.
tags :
- text-to-speech
- TTS
license : mpl-2.0
name : "coqui"
alias : "coqui"
capabilities :
nvidia : "cuda12-coqui"
2025-07-28 13:15:19 +00:00
intel : "intel-coqui"
2025-07-18 11:24:12 +00:00
amd : "rocm-coqui"
2026-02-03 20:57:50 +00:00
metal : "metal-coqui"
2025-12-02 13:24:35 +00:00
nvidia-cuda-13 : "cuda13-coqui"
nvidia-cuda-12 : "cuda12-coqui"
2025-07-18 11:24:12 +00:00
icon : https://avatars.githubusercontent.com/u/1338804?s=200&v=4
2026-02-03 20:57:50 +00:00
- &outetts
urls :
- https://github.com/OuteAI/outetts
description : |
OuteTTS is an open-weight text-to-speech model from OuteAI (OuteAI/OuteTTS-0.3-1B).
Supports custom speaker voices via audio path or default speakers.
tags :
- text-to-speech
- TTS
license : apache-2.0
name : "outetts"
alias : "outetts"
capabilities :
default : "cpu-outetts"
nvidia-cuda-12 : "cuda12-outetts"
2025-07-18 11:24:12 +00:00
- &chatterbox
urls :
- https://github.com/resemble-ai/chatterbox
description : |
Resemble AI's first production-grade open source TTS model. Licensed under MIT, Chatterbox has been benchmarked against leading closed-source systems like ElevenLabs, and is consistently preferred in side-by-side evaluations.
Whether you're working on memes, videos, games, or AI agents, Chatterbox brings your content to life. It's also the first open source TTS model to support emotion exaggeration control, a powerful feature that makes your voices stand out.
tags :
- text-to-speech
- TTS
license : MIT
2026-01-28 08:08:03 +00:00
icon : https://avatars.githubusercontent.com/u/49844015?s=200&v=4
2025-07-18 11:24:12 +00:00
name : "chatterbox"
2025-08-08 10:53:13 +00:00
alias : "chatterbox"
2025-07-18 11:24:12 +00:00
capabilities :
nvidia : "cuda12-chatterbox"
2025-09-09 15:58:07 +00:00
metal : "metal-chatterbox"
default : "cpu-chatterbox"
2025-09-24 16:37:37 +00:00
nvidia-l4t : "nvidia-l4t-arm64-chatterbox"
2025-12-02 13:24:35 +00:00
nvidia-cuda-13 : "cuda13-chatterbox"
nvidia-cuda-12 : "cuda12-chatterbox"
nvidia-l4t-cuda-12 : "nvidia-l4t-arm64-chatterbox"
nvidia-l4t-cuda-13 : "cuda13-nvidia-l4t-arm64-chatterbox"
2025-12-10 20:14:21 +00:00
- &vibevoice
urls :
- https://github.com/microsoft/VibeVoice
description : |
VibeVoice-Realtime is a real-time text-to-speech model that generates natural-sounding speech.
tags :
- text-to-speech
- TTS
license : mit
name : "vibevoice"
alias : "vibevoice"
capabilities :
nvidia : "cuda12-vibevoice"
intel : "intel-vibevoice"
amd : "rocm-vibevoice"
nvidia-l4t : "nvidia-l4t-vibevoice"
2026-02-03 20:57:50 +00:00
metal : "metal-vibevoice"
2025-12-10 20:14:21 +00:00
default : "cpu-vibevoice"
nvidia-cuda-13 : "cuda13-vibevoice"
nvidia-cuda-12 : "cuda12-vibevoice"
nvidia-l4t-cuda-12 : "nvidia-l4t-vibevoice"
nvidia-l4t-cuda-13 : "cuda13-nvidia-l4t-arm64-vibevoice"
icon : https://avatars.githubusercontent.com/u/6154722?s=200&v=4
2026-01-23 14:18:41 +00:00
- &qwen-tts
urls :
- https://github.com/QwenLM/Qwen3-TTS
description : |
Qwen3-TTS is a high-quality text-to-speech model supporting custom voice, voice design, and voice cloning.
tags :
- text-to-speech
- TTS
license : apache-2.0
name : "qwen-tts"
alias : "qwen-tts"
capabilities :
nvidia : "cuda12-qwen-tts"
intel : "intel-qwen-tts"
amd : "rocm-qwen-tts"
nvidia-l4t : "nvidia-l4t-qwen-tts"
2026-02-03 20:57:50 +00:00
metal : "metal-qwen-tts"
2026-01-23 14:18:41 +00:00
default : "cpu-qwen-tts"
nvidia-cuda-13 : "cuda13-qwen-tts"
nvidia-cuda-12 : "cuda12-qwen-tts"
nvidia-l4t-cuda-12 : "nvidia-l4t-qwen-tts"
nvidia-l4t-cuda-13 : "cuda13-nvidia-l4t-arm64-qwen-tts"
2026-01-23 21:00:14 +00:00
icon : https://cdn-avatars.huggingface.co/v1/production/uploads/620760a26e3b7210c2ff1943/-s1gyJfvbE1RgO5iBeNOi.png
2026-03-12 06:48:23 +00:00
- &fish-speech
urls :
- https://github.com/fishaudio/fish-speech
description : |
Fish Speech is a high-quality text-to-speech model supporting voice cloning via reference audio.
tags :
- text-to-speech
- TTS
- voice-cloning
license : apache-2.0
name : "fish-speech"
alias : "fish-speech"
capabilities :
nvidia : "cuda12-fish-speech"
intel : "intel-fish-speech"
amd : "rocm-fish-speech"
nvidia-l4t : "nvidia-l4t-fish-speech"
metal : "metal-fish-speech"
default : "cpu-fish-speech"
nvidia-cuda-13 : "cuda13-fish-speech"
nvidia-cuda-12 : "cuda12-fish-speech"
nvidia-l4t-cuda-12 : "nvidia-l4t-fish-speech"
nvidia-l4t-cuda-13 : "cuda13-nvidia-l4t-arm64-fish-speech"
icon : https://avatars.githubusercontent.com/u/148526220?s=200&v=4
2026-02-27 07:16:51 +00:00
- &faster-qwen3-tts
urls :
- https://github.com/andimarafioti/faster-qwen3-tts
- https://pypi.org/project/faster-qwen3-tts/
description : |
Real-time Qwen3-TTS inference using CUDA graph capture. Voice clone only; requires NVIDIA GPU with CUDA.
tags :
- text-to-speech
- TTS
- voice-clone
license : apache-2.0
name : "faster-qwen3-tts"
alias : "faster-qwen3-tts"
capabilities :
nvidia : "cuda12-faster-qwen3-tts"
default : "cuda12-faster-qwen3-tts"
nvidia-cuda-13 : "cuda13-faster-qwen3-tts"
nvidia-cuda-12 : "cuda12-faster-qwen3-tts"
nvidia-l4t : "nvidia-l4t-faster-qwen3-tts"
nvidia-l4t-cuda-12 : "nvidia-l4t-faster-qwen3-tts"
nvidia-l4t-cuda-13 : "cuda13-nvidia-l4t-arm64-faster-qwen3-tts"
icon : https://cdn-avatars.huggingface.co/v1/production/uploads/620760a26e3b7210c2ff1943/-s1gyJfvbE1RgO5iBeNOi.png
2026-01-29 20:50:35 +00:00
- &qwen-asr
urls :
- https://github.com/QwenLM/Qwen3-ASR
description : |
Qwen3-ASR is an automatic speech recognition model supporting multiple languages and batch inference.
tags :
- speech-recognition
- ASR
license : apache-2.0
name : "qwen-asr"
alias : "qwen-asr"
capabilities :
nvidia : "cuda12-qwen-asr"
intel : "intel-qwen-asr"
amd : "rocm-qwen-asr"
nvidia-l4t : "nvidia-l4t-qwen-asr"
2026-02-03 20:57:50 +00:00
metal : "metal-qwen-asr"
2026-01-29 20:50:35 +00:00
default : "cpu-qwen-asr"
nvidia-cuda-13 : "cuda13-qwen-asr"
nvidia-cuda-12 : "cuda12-qwen-asr"
nvidia-l4t-cuda-12 : "nvidia-l4t-qwen-asr"
nvidia-l4t-cuda-13 : "cuda13-nvidia-l4t-arm64-qwen-asr"
icon : https://cdn-avatars.huggingface.co/v1/production/uploads/620760a26e3b7210c2ff1943/-s1gyJfvbE1RgO5iBeNOi.png
2026-02-07 07:19:37 +00:00
- &nemo
urls :
- https://github.com/NVIDIA/NeMo
description : |
NVIDIA NEMO Toolkit for ASR provides state-of-the-art automatic speech recognition models including Parakeet models for various languages and use cases.
tags :
- speech-recognition
- ASR
- NVIDIA
license : apache-2.0
name : "nemo"
alias : "nemo"
capabilities :
nvidia : "cuda12-nemo"
intel : "intel-nemo"
amd : "rocm-nemo"
metal : "metal-nemo"
default : "cpu-nemo"
nvidia-cuda-13 : "cuda13-nemo"
nvidia-cuda-12 : "cuda12-nemo"
icon : https://www.nvidia.com/favicon.ico
2026-01-28 13:44:04 +00:00
- &voxcpm
urls :
- https://github.com/ModelBest/VoxCPM
description : |
VoxCPM is an innovative end-to-end TTS model from ModelBest, designed to generate highly expressive speech.
tags :
- text-to-speech
- TTS
license : mit
name : "voxcpm"
alias : "voxcpm"
capabilities :
nvidia : "cuda12-voxcpm"
intel : "intel-voxcpm"
amd : "rocm-voxcpm"
2026-02-03 20:57:50 +00:00
metal : "metal-voxcpm"
2026-01-28 13:44:04 +00:00
default : "cpu-voxcpm"
nvidia-cuda-13 : "cuda13-voxcpm"
nvidia-cuda-12 : "cuda12-voxcpm"
icon : https://avatars.githubusercontent.com/u/6154722?s=200&v=4
2026-01-13 22:35:19 +00:00
- &pocket-tts
urls :
- https://github.com/kyutai-labs/pocket-tts
description : |
Pocket TTS is a lightweight text-to-speech model designed to run efficiently on CPUs.
tags :
- text-to-speech
- TTS
license : mit
name : "pocket-tts"
alias : "pocket-tts"
capabilities :
nvidia : "cuda12-pocket-tts"
intel : "intel-pocket-tts"
amd : "rocm-pocket-tts"
nvidia-l4t : "nvidia-l4t-pocket-tts"
2026-02-03 20:57:50 +00:00
metal : "metal-pocket-tts"
2026-01-13 22:35:19 +00:00
default : "cpu-pocket-tts"
nvidia-cuda-13 : "cuda13-pocket-tts"
nvidia-cuda-12 : "cuda12-pocket-tts"
nvidia-l4t-cuda-12 : "nvidia-l4t-pocket-tts"
nvidia-l4t-cuda-13 : "cuda13-nvidia-l4t-arm64-pocket-tts"
2026-01-28 08:08:03 +00:00
icon : https://avatars.githubusercontent.com/u/151010778?s=200&v=4
2025-07-19 06:31:33 +00:00
- &piper
name : "piper"
uri : "quay.io/go-skynet/local-ai-backends:latest-piper"
icon : https://github.com/OHF-Voice/piper1-gpl/raw/main/etc/logo.png
urls :
- https://github.com/rhasspy/piper
- https://github.com/mudler/go-piper
2025-07-25 17:20:08 +00:00
mirrors :
- localai/localai-backends:latest-piper
2025-07-19 06:31:33 +00:00
license : MIT
description : |
A fast, local neural text to speech system
tags :
- text-to-speech
- TTS
2026-03-13 20:37:15 +00:00
- &opus
name : "opus"
2026-03-20 14:08:07 +00:00
alias : "opus"
2026-03-13 20:37:15 +00:00
uri : "quay.io/go-skynet/local-ai-backends:latest-cpu-opus"
urls :
- https://opus-codec.org/
mirrors :
- localai/localai-backends:latest-cpu-opus
license : BSD-3-Clause
description : |
Opus audio codec backend for encoding and decoding audio.
Required for WebRTC transport in the Realtime API.
tags :
- audio-codec
- opus
- WebRTC
- realtime
- CPU
2025-07-22 14:31:04 +00:00
- &silero-vad
name : "silero-vad"
uri : "quay.io/go-skynet/local-ai-backends:latest-cpu-silero-vad"
icon : https://user-images.githubusercontent.com/12515440/89997349-b3523080-dc94-11ea-9906-ca2e8bc50535.png
urls :
- https://github.com/snakers4/silero-vad
2025-07-25 17:20:08 +00:00
mirrors :
- localai/localai-backends:latest-cpu-silero-vad
2025-07-22 14:31:04 +00:00
description : |
Silero VAD : pre-trained enterprise-grade Voice Activity Detector.
Silero VAD is a voice activity detection model that can be used to detect whether a given audio contains speech or not.
tags :
- voice-activity-detection
- VAD
- silero-vad
- CPU
- &local-store
name : "local-store"
uri : "quay.io/go-skynet/local-ai-backends:latest-cpu-local-store"
2025-07-25 17:20:08 +00:00
mirrors :
- localai/localai-backends:latest-cpu-local-store
2025-07-22 14:31:04 +00:00
urls :
- https://github.com/mudler/LocalAI
description : |
Local Store is a local-first, self-hosted, and open-source vector database.
tags :
- vector-database
- local-first
- open-source
- CPU
license : MIT
2025-08-06 10:38:45 +00:00
- &kitten-tts
name : "kitten-tts"
uri : "quay.io/go-skynet/local-ai-backends:latest-kitten-tts"
mirrors :
- localai/localai-backends:latest-kitten-tts
urls :
- https://github.com/KittenML/KittenTTS
description : |
Kitten TTS is a text-to-speech model that can generate speech from text.
tags :
- text-to-speech
- TTS
license : apache-2.0
2025-10-09 19:51:28 +00:00
- &neutts
name : "neutts"
urls :
- https://github.com/neuphonic/neutts-air
description : |
NeuTTS Air is the world’ s first super-realistic, on-device, TTS speech language model with instant voice cloning. Built off a 0.5B LLM backbone, NeuTTS Air brings natural-sounding speech, real-time performance, built-in security and speaker cloning to your local device - unlocking a new category of embedded voice agents, assistants, toys, and compliance-safe apps.
tags :
- text-to-speech
- TTS
license : apache-2.0
capabilities :
default : "cpu-neutts"
nvidia : "cuda12-neutts"
amd : "rocm-neutts"
2025-12-02 13:24:35 +00:00
nvidia-cuda-12 : "cuda12-neutts"
2025-10-09 19:51:28 +00:00
- !!merge << : *neutts
name : "neutts-development"
capabilities :
default : "cpu-neutts-development"
nvidia : "cuda12-neutts-development"
amd : "rocm-neutts-development"
2025-12-02 13:24:35 +00:00
nvidia-cuda-12 : "cuda12-neutts-development"
- !!merge << : *llamacpp
name : "llama-cpp-development"
capabilities :
default : "cpu-llama-cpp-development"
nvidia : "cuda12-llama-cpp-development"
intel : "intel-sycl-f16-llama-cpp-development"
amd : "rocm-llama-cpp-development"
metal : "metal-llama-cpp-development"
vulkan : "vulkan-llama-cpp-development"
nvidia-l4t : "nvidia-l4t-arm64-llama-cpp-development"
nvidia-cuda-13 : "cuda13-llama-cpp-development"
nvidia-cuda-12 : "cuda12-llama-cpp-development"
nvidia-l4t-cuda-12 : "nvidia-l4t-arm64-llama-cpp-development"
nvidia-l4t-cuda-13 : "cuda13-nvidia-l4t-arm64-llama-cpp-development"
2026-04-12 11:51:28 +00:00
- !!merge << : *ikllamacpp
name : "ik-llama-cpp-development"
capabilities :
default : "cpu-ik-llama-cpp-development"
feat(backend): add turboquant llama.cpp-fork backend (#9355)
* feat(backend): add turboquant llama.cpp-fork backend
turboquant is a llama.cpp fork (TheTom/llama-cpp-turboquant, branch
feature/turboquant-kv-cache) that adds a TurboQuant KV-cache scheme.
It ships as a first-class backend reusing backend/cpp/llama-cpp sources
via a thin wrapper Makefile: each variant target copies ../llama-cpp
into a sibling build dir and invokes llama-cpp's build-llama-cpp-grpc-server
with LLAMA_REPO/LLAMA_VERSION overridden to point at the fork. No
duplication of grpc-server.cpp — upstream fixes flow through automatically.
Wires up the full matrix (CPU, CUDA 12/13, L4T, L4T-CUDA13, ROCm, SYCL
f32/f16, Vulkan) in backend.yml and the gallery entries in index.yaml,
adds a tests-turboquant-grpc e2e job driven by BACKEND_TEST_CACHE_TYPE_K/V=q8_0
to exercise the KV-cache config path (backend_test.go gains dedicated env
vars wired into ModelOptions.CacheTypeKey/Value — a generic improvement
usable by any llama.cpp-family backend), and registers a nightly auto-bump
PR in bump_deps.yaml tracking feature/turboquant-kv-cache.
scripts/changed-backends.js gets a special-case so edits to
backend/cpp/llama-cpp/ also retrigger the turboquant CI pipeline, since
the wrapper reuses those sources.
* feat(turboquant): carry upstream patches against fork API drift
turboquant branched from llama.cpp before upstream commit 66060008
("server: respect the ignore eos flag", #21203) which added the
`logit_bias_eog` field to `server_context_meta` and a matching
parameter to `server_task::params_from_json_cmpl`. The shared
backend/cpp/llama-cpp/grpc-server.cpp depends on that field, so
building it against the fork unmodified fails.
Cherry-pick that commit as a patch file under
backend/cpp/turboquant/patches/ and apply it to the cloned fork
sources via a new apply-patches.sh hook called from the wrapper
Makefile. Simplifies the build flow too: instead of hopping through
llama-cpp's build-llama-cpp-grpc-server indirection, the wrapper now
drives the copied Makefile directly (clone -> patch -> build).
Drop the corresponding patch whenever the fork catches up with
upstream — the build fails fast if a patch stops applying, which
is the signal to retire it.
* docs: add turboquant backend section + clarify cache_type_k/v
Document the new turboquant (llama.cpp fork with TurboQuant KV-cache)
backend alongside the existing llama-cpp / ik-llama-cpp sections in
features/text-generation.md: when to pick it, how to install it from
the gallery, and a YAML example showing backend: turboquant together
with cache_type_k / cache_type_v.
Also expand the cache_type_k / cache_type_v table rows in
advanced/model-configuration.md to spell out the accepted llama.cpp
quantization values and note that these fields apply to all
llama.cpp-family backends, not just vLLM.
* feat(turboquant): patch ggml-rpc GGML_OP_COUNT assertion
The fork adds new GGML ops bringing GGML_OP_COUNT to 97, but
ggml/include/ggml-rpc.h static-asserts it equals 96, breaking
the GGML_RPC=ON build paths (turboquant-grpc / turboquant-rpc-server).
Carry a one-line patch that updates the expected count so the
assertion holds. Drop this patch whenever the fork fixes it upstream.
* feat(turboquant): allow turbo* KV-cache types and exercise them in e2e
The shared backend/cpp/llama-cpp/grpc-server.cpp carries its own
allow-list of accepted KV-cache types (kv_cache_types[]) and rejects
anything outside it before the value reaches llama.cpp's parser. That
list only contains the standard llama.cpp types — turbo2/turbo3/turbo4
would throw "Unsupported cache type" at LoadModel time, meaning
nothing the LocalAI gRPC layer accepted was actually fork-specific.
Add a build-time augmentation step (patch-grpc-server.sh, called from
the turboquant wrapper Makefile) that inserts GGML_TYPE_TURBO2_0/3_0/4_0
into the allow-list of the *copied* grpc-server.cpp under
turboquant-<flavor>-build/. The original file under backend/cpp/llama-cpp/
is never touched, so the stock llama-cpp build keeps compiling against
vanilla upstream which has no notion of those enum values.
Switch test-extra-backend-turboquant to set
BACKEND_TEST_CACHE_TYPE_K=turbo3 / _V=turbo3 so the e2e gRPC suite
actually runs the fork's TurboQuant KV-cache code paths (turbo3 also
auto-enables flash_attention in the fork). Picking q8_0 here would
only re-test the standard llama.cpp path that the upstream llama-cpp
backend already covers.
Refresh the docs (text-generation.md + model-configuration.md) to
list turbo2/turbo3/turbo4 explicitly and call out that you only get
the TurboQuant code path with this backend + a turbo* cache type.
* fix(turboquant): rewrite patch-grpc-server.sh in awk, not python3
The builder image (ubuntu:24.04 stage-2 in Dockerfile.turboquant)
does not install python3, so the python-based augmentation step
errored with `python3: command not found` at make time. Switch to
awk, which ships in coreutils and is already available everywhere
the rest of the wrapper Makefile runs.
* Apply suggestion from @mudler
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
---------
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-04-14 23:25:04 +00:00
- !!merge << : *turboquant
name : "turboquant-development"
capabilities :
default : "cpu-turboquant-development"
nvidia : "cuda12-turboquant-development"
intel : "intel-sycl-f16-turboquant-development"
amd : "rocm-turboquant-development"
vulkan : "vulkan-turboquant-development"
nvidia-l4t : "nvidia-l4t-arm64-turboquant-development"
nvidia-cuda-13 : "cuda13-turboquant-development"
nvidia-cuda-12 : "cuda12-turboquant-development"
nvidia-l4t-cuda-12 : "nvidia-l4t-arm64-turboquant-development"
nvidia-l4t-cuda-13 : "cuda13-nvidia-l4t-arm64-turboquant-development"
2025-10-09 19:51:28 +00:00
- !!merge << : *neutts
name : "cpu-neutts"
uri : "quay.io/go-skynet/local-ai-backends:latest-cpu-neutts"
mirrors :
- localai/localai-backends:latest-cpu-neutts
- !!merge << : *neutts
name : "cuda12-neutts"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-12-neutts"
mirrors :
- localai/localai-backends:latest-gpu-nvidia-cuda-12-neutts
- !!merge << : *neutts
name : "rocm-neutts"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-rocm-hipblas-neutts"
mirrors :
- localai/localai-backends:latest-gpu-rocm-hipblas-neutts
- !!merge << : *neutts
name : "cpu-neutts-development"
uri : "quay.io/go-skynet/local-ai-backends:master-cpu-neutts"
mirrors :
- localai/localai-backends:master-cpu-neutts
- !!merge << : *neutts
name : "cuda12-neutts-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-12-neutts"
mirrors :
- localai/localai-backends:master-gpu-nvidia-cuda-12-neutts
- !!merge << : *neutts
name : "rocm-neutts-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-rocm-hipblas-neutts"
mirrors :
- localai/localai-backends:master-gpu-rocm-hipblas-neutts
2025-08-22 06:42:29 +00:00
- !!merge << : *mlx
name : "mlx-development"
uri : "quay.io/go-skynet/local-ai-backends:master-metal-darwin-arm64-mlx"
mirrors :
- localai/localai-backends:master-metal-darwin-arm64-mlx
2025-08-23 21:05:30 +00:00
- !!merge << : *mlx-vlm
name : "mlx-vlm-development"
uri : "quay.io/go-skynet/local-ai-backends:master-metal-darwin-arm64-mlx-vlm"
mirrors :
- localai/localai-backends:master-metal-darwin-arm64-mlx-vlm
2025-08-26 13:27:06 +00:00
- !!merge << : *mlx-audio
name : "mlx-audio-development"
uri : "quay.io/go-skynet/local-ai-backends:master-metal-darwin-arm64-mlx-audio"
mirrors :
- localai/localai-backends:master-metal-darwin-arm64-mlx-audio
2026-03-09 16:29:32 +00:00
- !!merge << : *mlx-distributed
name : "mlx-distributed-development"
uri : "quay.io/go-skynet/local-ai-backends:master-metal-darwin-arm64-mlx-distributed"
mirrors :
- localai/localai-backends:master-metal-darwin-arm64-mlx-distributed
2026-02-03 22:53:34 +00:00
## mlx
- !!merge << : *mlx
name : "cpu-mlx"
uri : "quay.io/go-skynet/local-ai-backends:latest-cpu-mlx"
mirrors :
- localai/localai-backends:latest-cpu-mlx
- !!merge << : *mlx
name : "cpu-mlx-development"
uri : "quay.io/go-skynet/local-ai-backends:master-cpu-mlx"
mirrors :
- localai/localai-backends:master-cpu-mlx
- !!merge << : *mlx
name : "cuda12-mlx"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-12-mlx"
mirrors :
- localai/localai-backends:latest-gpu-nvidia-cuda-12-mlx
- !!merge << : *mlx
name : "cuda12-mlx-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-12-mlx"
mirrors :
- localai/localai-backends:master-gpu-nvidia-cuda-12-mlx
- !!merge << : *mlx
name : "cuda13-mlx"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-13-mlx"
mirrors :
- localai/localai-backends:latest-gpu-nvidia-cuda-13-mlx
- !!merge << : *mlx
name : "cuda13-mlx-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-13-mlx"
mirrors :
- localai/localai-backends:master-gpu-nvidia-cuda-13-mlx
- !!merge << : *mlx
name : "nvidia-l4t-mlx"
uri : "quay.io/go-skynet/local-ai-backends:latest-nvidia-l4t-mlx"
mirrors :
- localai/localai-backends:latest-nvidia-l4t-mlx
- !!merge << : *mlx
name : "nvidia-l4t-mlx-development"
uri : "quay.io/go-skynet/local-ai-backends:master-nvidia-l4t-mlx"
mirrors :
- localai/localai-backends:master-nvidia-l4t-mlx
- !!merge << : *mlx
name : "cuda13-nvidia-l4t-arm64-mlx"
uri : "quay.io/go-skynet/local-ai-backends:latest-nvidia-l4t-cuda-13-arm64-mlx"
mirrors :
- localai/localai-backends:latest-nvidia-l4t-cuda-13-arm64-mlx
- !!merge << : *mlx
name : "cuda13-nvidia-l4t-arm64-mlx-development"
uri : "quay.io/go-skynet/local-ai-backends:master-nvidia-l4t-cuda-13-arm64-mlx"
mirrors :
- localai/localai-backends:master-nvidia-l4t-cuda-13-arm64-mlx
## mlx-vlm
- !!merge << : *mlx-vlm
name : "cpu-mlx-vlm"
uri : "quay.io/go-skynet/local-ai-backends:latest-cpu-mlx-vlm"
mirrors :
- localai/localai-backends:latest-cpu-mlx-vlm
- !!merge << : *mlx-vlm
name : "cpu-mlx-vlm-development"
uri : "quay.io/go-skynet/local-ai-backends:master-cpu-mlx-vlm"
mirrors :
- localai/localai-backends:master-cpu-mlx-vlm
- !!merge << : *mlx-vlm
name : "cuda12-mlx-vlm"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-12-mlx-vlm"
mirrors :
- localai/localai-backends:latest-gpu-nvidia-cuda-12-mlx-vlm
- !!merge << : *mlx-vlm
name : "cuda12-mlx-vlm-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-12-mlx-vlm"
mirrors :
- localai/localai-backends:master-gpu-nvidia-cuda-12-mlx-vlm
- !!merge << : *mlx-vlm
name : "cuda13-mlx-vlm"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-13-mlx-vlm"
mirrors :
- localai/localai-backends:latest-gpu-nvidia-cuda-13-mlx-vlm
- !!merge << : *mlx-vlm
name : "cuda13-mlx-vlm-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-13-mlx-vlm"
mirrors :
- localai/localai-backends:master-gpu-nvidia-cuda-13-mlx-vlm
- !!merge << : *mlx-vlm
name : "nvidia-l4t-mlx-vlm"
uri : "quay.io/go-skynet/local-ai-backends:latest-nvidia-l4t-mlx-vlm"
mirrors :
- localai/localai-backends:latest-nvidia-l4t-mlx-vlm
- !!merge << : *mlx-vlm
name : "nvidia-l4t-mlx-vlm-development"
uri : "quay.io/go-skynet/local-ai-backends:master-nvidia-l4t-mlx-vlm"
mirrors :
- localai/localai-backends:master-nvidia-l4t-mlx-vlm
- !!merge << : *mlx-vlm
name : "cuda13-nvidia-l4t-arm64-mlx-vlm"
uri : "quay.io/go-skynet/local-ai-backends:latest-nvidia-l4t-cuda-13-arm64-mlx-vlm"
mirrors :
- localai/localai-backends:latest-nvidia-l4t-cuda-13-arm64-mlx-vlm
- !!merge << : *mlx-vlm
name : "cuda13-nvidia-l4t-arm64-mlx-vlm-development"
uri : "quay.io/go-skynet/local-ai-backends:master-nvidia-l4t-cuda-13-arm64-mlx-vlm"
mirrors :
- localai/localai-backends:master-nvidia-l4t-cuda-13-arm64-mlx-vlm
## mlx-audio
- !!merge << : *mlx-audio
name : "cpu-mlx-audio"
uri : "quay.io/go-skynet/local-ai-backends:latest-cpu-mlx-audio"
mirrors :
- localai/localai-backends:latest-cpu-mlx-audio
- !!merge << : *mlx-audio
name : "cpu-mlx-audio-development"
uri : "quay.io/go-skynet/local-ai-backends:master-cpu-mlx-audio"
mirrors :
- localai/localai-backends:master-cpu-mlx-audio
- !!merge << : *mlx-audio
name : "cuda12-mlx-audio"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-12-mlx-audio"
mirrors :
- localai/localai-backends:latest-gpu-nvidia-cuda-12-mlx-audio
- !!merge << : *mlx-audio
name : "cuda12-mlx-audio-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-12-mlx-audio"
mirrors :
- localai/localai-backends:master-gpu-nvidia-cuda-12-mlx-audio
- !!merge << : *mlx-audio
name : "cuda13-mlx-audio"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-13-mlx-audio"
mirrors :
- localai/localai-backends:latest-gpu-nvidia-cuda-13-mlx-audio
- !!merge << : *mlx-audio
name : "cuda13-mlx-audio-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-13-mlx-audio"
mirrors :
- localai/localai-backends:master-gpu-nvidia-cuda-13-mlx-audio
- !!merge << : *mlx-audio
name : "nvidia-l4t-mlx-audio"
uri : "quay.io/go-skynet/local-ai-backends:latest-nvidia-l4t-mlx-audio"
mirrors :
- localai/localai-backends:latest-nvidia-l4t-mlx-audio
- !!merge << : *mlx-audio
name : "nvidia-l4t-mlx-audio-development"
uri : "quay.io/go-skynet/local-ai-backends:master-nvidia-l4t-mlx-audio"
mirrors :
- localai/localai-backends:master-nvidia-l4t-mlx-audio
- !!merge << : *mlx-audio
name : "cuda13-nvidia-l4t-arm64-mlx-audio"
uri : "quay.io/go-skynet/local-ai-backends:latest-nvidia-l4t-cuda-13-arm64-mlx-audio"
mirrors :
- localai/localai-backends:latest-nvidia-l4t-cuda-13-arm64-mlx-audio
- !!merge << : *mlx-audio
name : "cuda13-nvidia-l4t-arm64-mlx-audio-development"
uri : "quay.io/go-skynet/local-ai-backends:master-nvidia-l4t-cuda-13-arm64-mlx-audio"
mirrors :
- localai/localai-backends:master-nvidia-l4t-cuda-13-arm64-mlx-audio
2026-03-09 16:29:32 +00:00
## mlx-distributed
- !!merge << : *mlx-distributed
name : "cpu-mlx-distributed"
uri : "quay.io/go-skynet/local-ai-backends:latest-cpu-mlx-distributed"
mirrors :
- localai/localai-backends:latest-cpu-mlx-distributed
- !!merge << : *mlx-distributed
name : "cpu-mlx-distributed-development"
uri : "quay.io/go-skynet/local-ai-backends:master-cpu-mlx-distributed"
mirrors :
- localai/localai-backends:master-cpu-mlx-distributed
- !!merge << : *mlx-distributed
name : "cuda12-mlx-distributed"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-12-mlx-distributed"
mirrors :
- localai/localai-backends:latest-gpu-nvidia-cuda-12-mlx-distributed
- !!merge << : *mlx-distributed
name : "cuda12-mlx-distributed-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-12-mlx-distributed"
mirrors :
- localai/localai-backends:master-gpu-nvidia-cuda-12-mlx-distributed
- !!merge << : *mlx-distributed
name : "cuda13-mlx-distributed"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-13-mlx-distributed"
mirrors :
- localai/localai-backends:latest-gpu-nvidia-cuda-13-mlx-distributed
- !!merge << : *mlx-distributed
name : "cuda13-mlx-distributed-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-13-mlx-distributed"
mirrors :
- localai/localai-backends:master-gpu-nvidia-cuda-13-mlx-distributed
- !!merge << : *mlx-distributed
name : "nvidia-l4t-mlx-distributed"
uri : "quay.io/go-skynet/local-ai-backends:latest-nvidia-l4t-mlx-distributed"
mirrors :
- localai/localai-backends:latest-nvidia-l4t-mlx-distributed
- !!merge << : *mlx-distributed
name : "nvidia-l4t-mlx-distributed-development"
uri : "quay.io/go-skynet/local-ai-backends:master-nvidia-l4t-mlx-distributed"
mirrors :
- localai/localai-backends:master-nvidia-l4t-mlx-distributed
- !!merge << : *mlx-distributed
name : "cuda13-nvidia-l4t-arm64-mlx-distributed"
uri : "quay.io/go-skynet/local-ai-backends:latest-nvidia-l4t-cuda-13-arm64-mlx-distributed"
mirrors :
- localai/localai-backends:latest-nvidia-l4t-cuda-13-arm64-mlx-distributed
- !!merge << : *mlx-distributed
name : "cuda13-nvidia-l4t-arm64-mlx-distributed-development"
uri : "quay.io/go-skynet/local-ai-backends:master-nvidia-l4t-cuda-13-arm64-mlx-distributed"
mirrors :
- localai/localai-backends:master-nvidia-l4t-cuda-13-arm64-mlx-distributed
2025-08-06 10:38:45 +00:00
- !!merge << : *kitten-tts
name : "kitten-tts-development"
uri : "quay.io/go-skynet/local-ai-backends:master-kitten-tts"
mirrors :
- localai/localai-backends:master-kitten-tts
2026-02-03 20:57:50 +00:00
- !!merge << : *kitten-tts
name : "metal-kitten-tts"
uri : "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-kitten-tts"
mirrors :
- localai/localai-backends:latest-metal-darwin-arm64-kitten-tts
- !!merge << : *kitten-tts
name : "metal-kitten-tts-development"
uri : "quay.io/go-skynet/local-ai-backends:master-metal-darwin-arm64-kitten-tts"
mirrors :
- localai/localai-backends:master-metal-darwin-arm64-kitten-tts
2025-07-22 14:31:04 +00:00
- !!merge << : *local-store
name : "local-store-development"
uri : "quay.io/go-skynet/local-ai-backends:master-cpu-local-store"
2025-07-25 17:20:08 +00:00
mirrors :
- localai/localai-backends:master-cpu-local-store
2026-02-03 20:57:50 +00:00
- !!merge << : *local-store
name : "metal-local-store"
uri : "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-local-store"
mirrors :
- localai/localai-backends:latest-metal-darwin-arm64-local-store
- !!merge << : *local-store
name : "metal-local-store-development"
uri : "quay.io/go-skynet/local-ai-backends:master-metal-darwin-arm64-local-store"
mirrors :
- localai/localai-backends:master-metal-darwin-arm64-local-store
2026-03-13 20:37:15 +00:00
- !!merge << : *opus
name : "opus-development"
uri : "quay.io/go-skynet/local-ai-backends:master-cpu-opus"
mirrors :
- localai/localai-backends:master-cpu-opus
- !!merge << : *opus
name : "metal-opus"
uri : "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-opus"
mirrors :
- localai/localai-backends:latest-metal-darwin-arm64-opus
- !!merge << : *opus
name : "metal-opus-development"
uri : "quay.io/go-skynet/local-ai-backends:master-metal-darwin-arm64-opus"
mirrors :
- localai/localai-backends:master-metal-darwin-arm64-opus
2025-07-22 14:31:04 +00:00
- !!merge << : *silero-vad
name : "silero-vad-development"
uri : "quay.io/go-skynet/local-ai-backends:master-cpu-silero-vad"
2025-07-25 17:20:08 +00:00
mirrors :
- localai/localai-backends:master-cpu-silero-vad
2026-02-03 20:57:50 +00:00
- !!merge << : *silero-vad
name : "metal-silero-vad"
uri : "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-silero-vad"
mirrors :
- localai/localai-backends:latest-metal-darwin-arm64-silero-vad
- !!merge << : *silero-vad
name : "metal-silero-vad-development"
uri : "quay.io/go-skynet/local-ai-backends:master-metal-darwin-arm64-silero-vad"
mirrors :
- localai/localai-backends:master-metal-darwin-arm64-silero-vad
2025-07-19 06:31:33 +00:00
- !!merge << : *piper
name : "piper-development"
uri : "quay.io/go-skynet/local-ai-backends:master-piper"
2025-07-25 17:20:08 +00:00
mirrors :
- localai/localai-backends:master-piper
2026-02-03 20:57:50 +00:00
- !!merge << : *piper
name : "metal-piper"
uri : "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-piper"
mirrors :
- localai/localai-backends:latest-metal-darwin-arm64-piper
- !!merge << : *piper
name : "metal-piper-development"
uri : "quay.io/go-skynet/local-ai-backends:master-metal-darwin-arm64-piper"
mirrors :
- localai/localai-backends:master-metal-darwin-arm64-piper
2025-07-18 11:24:12 +00:00
## llama-cpp
- !!merge << : *llamacpp
name : "nvidia-l4t-arm64-llama-cpp"
uri : "quay.io/go-skynet/local-ai-backends:latest-nvidia-l4t-arm64-llama-cpp"
2025-07-25 17:20:08 +00:00
mirrors :
- localai/localai-backends:latest-nvidia-l4t-arm64-llama-cpp
2025-07-18 11:24:12 +00:00
- !!merge << : *llamacpp
name : "nvidia-l4t-arm64-llama-cpp-development"
uri : "quay.io/go-skynet/local-ai-backends:master-nvidia-l4t-arm64-llama-cpp"
2025-07-25 17:20:08 +00:00
mirrors :
- localai/localai-backends:master-nvidia-l4t-arm64-llama-cpp
2025-12-02 13:24:35 +00:00
- !!merge << : *llamacpp
name : "cuda13-nvidia-l4t-arm64-llama-cpp"
2025-12-02 14:15:41 +00:00
uri : "quay.io/go-skynet/local-ai-backends:latest-nvidia-l4t-cuda-13-arm64-llama-cpp"
2025-12-02 13:24:35 +00:00
mirrors :
2025-12-02 14:15:41 +00:00
- localai/localai-backends:latest-nvidia-l4t-cuda-13-arm64-llama-cpp
2025-12-02 13:24:35 +00:00
- !!merge << : *llamacpp
name : "cuda13-nvidia-l4t-arm64-llama-cpp-development"
2025-12-02 14:15:41 +00:00
uri : "quay.io/go-skynet/local-ai-backends:master-nvidia-l4t-cuda-13-arm64-llama-cpp"
2025-12-02 13:24:35 +00:00
mirrors :
2025-12-02 14:15:41 +00:00
- localai/localai-backends:master-nvidia-l4t-cuda-13-arm64-llama-cpp
2025-07-18 11:24:12 +00:00
- !!merge << : *llamacpp
name : "cpu-llama-cpp"
uri : "quay.io/go-skynet/local-ai-backends:latest-cpu-llama-cpp"
2025-07-25 17:20:08 +00:00
mirrors :
- localai/localai-backends:latest-cpu-llama-cpp
2025-07-18 11:24:12 +00:00
- !!merge << : *llamacpp
name : "cpu-llama-cpp-development"
uri : "quay.io/go-skynet/local-ai-backends:master-cpu-llama-cpp"
2025-07-25 17:20:08 +00:00
mirrors :
- localai/localai-backends:master-cpu-llama-cpp
2025-07-18 11:24:12 +00:00
- !!merge << : *llamacpp
name : "cuda12-llama-cpp"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-12-llama-cpp"
2025-07-25 17:20:08 +00:00
mirrors :
- localai/localai-backends:latest-gpu-nvidia-cuda-12-llama-cpp
2025-07-18 11:24:12 +00:00
- !!merge << : *llamacpp
name : "rocm-llama-cpp"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-rocm-hipblas-llama-cpp"
2025-07-25 17:20:08 +00:00
mirrors :
- localai/localai-backends:latest-gpu-rocm-hipblas-llama-cpp
2025-07-18 11:24:12 +00:00
- !!merge << : *llamacpp
name : "intel-sycl-f32-llama-cpp"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-intel-sycl-f32-llama-cpp"
2025-07-25 17:20:08 +00:00
mirrors :
- localai/localai-backends:latest-gpu-intel-sycl-f32-llama-cpp
2025-07-18 11:24:12 +00:00
- !!merge << : *llamacpp
name : "intel-sycl-f16-llama-cpp"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-intel-sycl-f16-llama-cpp"
2025-07-25 17:20:08 +00:00
mirrors :
- localai/localai-backends:latest-gpu-intel-sycl-f16-llama-cpp
2025-07-19 19:58:53 +00:00
- !!merge << : *llamacpp
name : "vulkan-llama-cpp"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-vulkan-llama-cpp"
2025-07-25 17:20:08 +00:00
mirrors :
- localai/localai-backends:latest-gpu-vulkan-llama-cpp
2025-07-19 19:58:53 +00:00
- !!merge << : *llamacpp
name : "vulkan-llama-cpp-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-vulkan-llama-cpp"
2025-07-25 17:20:08 +00:00
mirrors :
- localai/localai-backends:master-gpu-vulkan-llama-cpp
2025-07-18 11:24:12 +00:00
- !!merge << : *llamacpp
name : "metal-llama-cpp"
uri : "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-llama-cpp"
2025-07-25 17:20:08 +00:00
mirrors :
- localai/localai-backends:latest-metal-darwin-arm64-llama-cpp
2025-07-18 11:24:12 +00:00
- !!merge << : *llamacpp
name : "metal-llama-cpp-development"
uri : "quay.io/go-skynet/local-ai-backends:master-metal-darwin-arm64-llama-cpp"
2025-07-25 17:20:08 +00:00
mirrors :
- localai/localai-backends:master-metal-darwin-arm64-llama-cpp
2025-07-18 11:24:12 +00:00
- !!merge << : *llamacpp
name : "cuda12-llama-cpp-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-12-llama-cpp"
2025-07-25 17:20:08 +00:00
mirrors :
- localai/localai-backends:master-gpu-nvidia-cuda-12-llama-cpp
2025-07-18 11:24:12 +00:00
- !!merge << : *llamacpp
name : "rocm-llama-cpp-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-rocm-hipblas-llama-cpp"
2025-07-25 17:20:08 +00:00
mirrors :
- localai/localai-backends:master-gpu-rocm-hipblas-llama-cpp
2025-07-18 11:24:12 +00:00
- !!merge << : *llamacpp
name : "intel-sycl-f32-llama-cpp-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-intel-sycl-f32-llama-cpp"
2025-07-25 17:20:08 +00:00
mirrors :
- localai/localai-backends:master-gpu-intel-sycl-f32-llama-cpp
2025-07-18 11:24:12 +00:00
- !!merge << : *llamacpp
2025-07-19 19:58:53 +00:00
name : "intel-sycl-f16-llama-cpp-development"
2025-07-20 20:52:45 +00:00
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-intel-sycl-f16-llama-cpp"
2025-07-25 17:20:08 +00:00
mirrors :
- localai/localai-backends:master-gpu-intel-sycl-f16-llama-cpp
2025-12-02 13:24:35 +00:00
- !!merge << : *llamacpp
name : "cuda13-llama-cpp"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-13-llama-cpp"
mirrors :
- localai/localai-backends:latest-gpu-nvidia-cuda-13-llama-cpp
- !!merge << : *llamacpp
name : "cuda13-llama-cpp-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-13-llama-cpp"
mirrors :
- localai/localai-backends:master-gpu-nvidia-cuda-13-llama-cpp
2026-04-12 11:51:28 +00:00
## ik-llama-cpp
- !!merge << : *ikllamacpp
name : "cpu-ik-llama-cpp"
uri : "quay.io/go-skynet/local-ai-backends:latest-cpu-ik-llama-cpp"
mirrors :
- localai/localai-backends:latest-cpu-ik-llama-cpp
- !!merge << : *ikllamacpp
name : "cpu-ik-llama-cpp-development"
uri : "quay.io/go-skynet/local-ai-backends:master-cpu-ik-llama-cpp"
mirrors :
- localai/localai-backends:master-cpu-ik-llama-cpp
feat(backend): add turboquant llama.cpp-fork backend (#9355)
* feat(backend): add turboquant llama.cpp-fork backend
turboquant is a llama.cpp fork (TheTom/llama-cpp-turboquant, branch
feature/turboquant-kv-cache) that adds a TurboQuant KV-cache scheme.
It ships as a first-class backend reusing backend/cpp/llama-cpp sources
via a thin wrapper Makefile: each variant target copies ../llama-cpp
into a sibling build dir and invokes llama-cpp's build-llama-cpp-grpc-server
with LLAMA_REPO/LLAMA_VERSION overridden to point at the fork. No
duplication of grpc-server.cpp — upstream fixes flow through automatically.
Wires up the full matrix (CPU, CUDA 12/13, L4T, L4T-CUDA13, ROCm, SYCL
f32/f16, Vulkan) in backend.yml and the gallery entries in index.yaml,
adds a tests-turboquant-grpc e2e job driven by BACKEND_TEST_CACHE_TYPE_K/V=q8_0
to exercise the KV-cache config path (backend_test.go gains dedicated env
vars wired into ModelOptions.CacheTypeKey/Value — a generic improvement
usable by any llama.cpp-family backend), and registers a nightly auto-bump
PR in bump_deps.yaml tracking feature/turboquant-kv-cache.
scripts/changed-backends.js gets a special-case so edits to
backend/cpp/llama-cpp/ also retrigger the turboquant CI pipeline, since
the wrapper reuses those sources.
* feat(turboquant): carry upstream patches against fork API drift
turboquant branched from llama.cpp before upstream commit 66060008
("server: respect the ignore eos flag", #21203) which added the
`logit_bias_eog` field to `server_context_meta` and a matching
parameter to `server_task::params_from_json_cmpl`. The shared
backend/cpp/llama-cpp/grpc-server.cpp depends on that field, so
building it against the fork unmodified fails.
Cherry-pick that commit as a patch file under
backend/cpp/turboquant/patches/ and apply it to the cloned fork
sources via a new apply-patches.sh hook called from the wrapper
Makefile. Simplifies the build flow too: instead of hopping through
llama-cpp's build-llama-cpp-grpc-server indirection, the wrapper now
drives the copied Makefile directly (clone -> patch -> build).
Drop the corresponding patch whenever the fork catches up with
upstream — the build fails fast if a patch stops applying, which
is the signal to retire it.
* docs: add turboquant backend section + clarify cache_type_k/v
Document the new turboquant (llama.cpp fork with TurboQuant KV-cache)
backend alongside the existing llama-cpp / ik-llama-cpp sections in
features/text-generation.md: when to pick it, how to install it from
the gallery, and a YAML example showing backend: turboquant together
with cache_type_k / cache_type_v.
Also expand the cache_type_k / cache_type_v table rows in
advanced/model-configuration.md to spell out the accepted llama.cpp
quantization values and note that these fields apply to all
llama.cpp-family backends, not just vLLM.
* feat(turboquant): patch ggml-rpc GGML_OP_COUNT assertion
The fork adds new GGML ops bringing GGML_OP_COUNT to 97, but
ggml/include/ggml-rpc.h static-asserts it equals 96, breaking
the GGML_RPC=ON build paths (turboquant-grpc / turboquant-rpc-server).
Carry a one-line patch that updates the expected count so the
assertion holds. Drop this patch whenever the fork fixes it upstream.
* feat(turboquant): allow turbo* KV-cache types and exercise them in e2e
The shared backend/cpp/llama-cpp/grpc-server.cpp carries its own
allow-list of accepted KV-cache types (kv_cache_types[]) and rejects
anything outside it before the value reaches llama.cpp's parser. That
list only contains the standard llama.cpp types — turbo2/turbo3/turbo4
would throw "Unsupported cache type" at LoadModel time, meaning
nothing the LocalAI gRPC layer accepted was actually fork-specific.
Add a build-time augmentation step (patch-grpc-server.sh, called from
the turboquant wrapper Makefile) that inserts GGML_TYPE_TURBO2_0/3_0/4_0
into the allow-list of the *copied* grpc-server.cpp under
turboquant-<flavor>-build/. The original file under backend/cpp/llama-cpp/
is never touched, so the stock llama-cpp build keeps compiling against
vanilla upstream which has no notion of those enum values.
Switch test-extra-backend-turboquant to set
BACKEND_TEST_CACHE_TYPE_K=turbo3 / _V=turbo3 so the e2e gRPC suite
actually runs the fork's TurboQuant KV-cache code paths (turbo3 also
auto-enables flash_attention in the fork). Picking q8_0 here would
only re-test the standard llama.cpp path that the upstream llama-cpp
backend already covers.
Refresh the docs (text-generation.md + model-configuration.md) to
list turbo2/turbo3/turbo4 explicitly and call out that you only get
the TurboQuant code path with this backend + a turbo* cache type.
* fix(turboquant): rewrite patch-grpc-server.sh in awk, not python3
The builder image (ubuntu:24.04 stage-2 in Dockerfile.turboquant)
does not install python3, so the python-based augmentation step
errored with `python3: command not found` at make time. Switch to
awk, which ships in coreutils and is already available everywhere
the rest of the wrapper Makefile runs.
* Apply suggestion from @mudler
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
---------
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-04-14 23:25:04 +00:00
## turboquant
- !!merge << : *turboquant
name : "cpu-turboquant"
uri : "quay.io/go-skynet/local-ai-backends:latest-cpu-turboquant"
mirrors :
- localai/localai-backends:latest-cpu-turboquant
- !!merge << : *turboquant
name : "cpu-turboquant-development"
uri : "quay.io/go-skynet/local-ai-backends:master-cpu-turboquant"
mirrors :
- localai/localai-backends:master-cpu-turboquant
- !!merge << : *turboquant
name : "cuda12-turboquant"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-12-turboquant"
mirrors :
- localai/localai-backends:latest-gpu-nvidia-cuda-12-turboquant
- !!merge << : *turboquant
name : "cuda12-turboquant-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-12-turboquant"
mirrors :
- localai/localai-backends:master-gpu-nvidia-cuda-12-turboquant
- !!merge << : *turboquant
name : "cuda13-turboquant"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-13-turboquant"
mirrors :
- localai/localai-backends:latest-gpu-nvidia-cuda-13-turboquant
- !!merge << : *turboquant
name : "cuda13-turboquant-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-13-turboquant"
mirrors :
- localai/localai-backends:master-gpu-nvidia-cuda-13-turboquant
- !!merge << : *turboquant
name : "rocm-turboquant"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-rocm-hipblas-turboquant"
mirrors :
- localai/localai-backends:latest-gpu-rocm-hipblas-turboquant
- !!merge << : *turboquant
name : "rocm-turboquant-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-rocm-hipblas-turboquant"
mirrors :
- localai/localai-backends:master-gpu-rocm-hipblas-turboquant
- !!merge << : *turboquant
name : "intel-sycl-f32-turboquant"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-intel-sycl-f32-turboquant"
mirrors :
- localai/localai-backends:latest-gpu-intel-sycl-f32-turboquant
- !!merge << : *turboquant
name : "intel-sycl-f32-turboquant-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-intel-sycl-f32-turboquant"
mirrors :
- localai/localai-backends:master-gpu-intel-sycl-f32-turboquant
- !!merge << : *turboquant
name : "intel-sycl-f16-turboquant"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-intel-sycl-f16-turboquant"
mirrors :
- localai/localai-backends:latest-gpu-intel-sycl-f16-turboquant
- !!merge << : *turboquant
name : "intel-sycl-f16-turboquant-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-intel-sycl-f16-turboquant"
mirrors :
- localai/localai-backends:master-gpu-intel-sycl-f16-turboquant
- !!merge << : *turboquant
name : "vulkan-turboquant"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-vulkan-turboquant"
mirrors :
- localai/localai-backends:latest-gpu-vulkan-turboquant
- !!merge << : *turboquant
name : "vulkan-turboquant-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-vulkan-turboquant"
mirrors :
- localai/localai-backends:master-gpu-vulkan-turboquant
- !!merge << : *turboquant
name : "nvidia-l4t-arm64-turboquant"
uri : "quay.io/go-skynet/local-ai-backends:latest-nvidia-l4t-arm64-turboquant"
mirrors :
- localai/localai-backends:latest-nvidia-l4t-arm64-turboquant
- !!merge << : *turboquant
name : "nvidia-l4t-arm64-turboquant-development"
uri : "quay.io/go-skynet/local-ai-backends:master-nvidia-l4t-arm64-turboquant"
mirrors :
- localai/localai-backends:master-nvidia-l4t-arm64-turboquant
- !!merge << : *turboquant
name : "cuda13-nvidia-l4t-arm64-turboquant"
uri : "quay.io/go-skynet/local-ai-backends:latest-nvidia-l4t-cuda-13-arm64-turboquant"
mirrors :
- localai/localai-backends:latest-nvidia-l4t-cuda-13-arm64-turboquant
- !!merge << : *turboquant
name : "cuda13-nvidia-l4t-arm64-turboquant-development"
uri : "quay.io/go-skynet/local-ai-backends:master-nvidia-l4t-cuda-13-arm64-turboquant"
mirrors :
- localai/localai-backends:master-nvidia-l4t-cuda-13-arm64-turboquant
2025-07-20 20:52:45 +00:00
## whisper
2025-07-20 20:54:12 +00:00
- !!merge << : *whispercpp
2025-07-20 20:52:45 +00:00
name : "nvidia-l4t-arm64-whisper"
uri : "quay.io/go-skynet/local-ai-backends:latest-nvidia-l4t-arm64-whisper"
2025-07-25 17:20:08 +00:00
mirrors :
- localai/localai-backends:latest-nvidia-l4t-arm64-whisper
2025-07-20 20:54:12 +00:00
- !!merge << : *whispercpp
2025-07-20 20:52:45 +00:00
name : "nvidia-l4t-arm64-whisper-development"
uri : "quay.io/go-skynet/local-ai-backends:master-nvidia-l4t-arm64-whisper"
2025-07-25 17:20:08 +00:00
mirrors :
- localai/localai-backends:master-nvidia-l4t-arm64-whisper
2025-12-02 13:24:35 +00:00
- !!merge << : *whispercpp
name : "cuda13-nvidia-l4t-arm64-whisper"
2025-12-02 14:15:41 +00:00
uri : "quay.io/go-skynet/local-ai-backends:latest-nvidia-l4t-cuda-13-arm64-whisper"
2025-12-02 13:24:35 +00:00
mirrors :
2025-12-02 14:15:41 +00:00
- localai/localai-backends:latest-nvidia-l4t-cuda-13-arm64-whisper
2025-12-02 13:24:35 +00:00
- !!merge << : *whispercpp
name : "cuda13-nvidia-l4t-arm64-whisper-development"
2025-12-02 14:15:41 +00:00
uri : "quay.io/go-skynet/local-ai-backends:master-nvidia-l4t-cuda-13-arm64-whisper"
2025-12-02 13:24:35 +00:00
mirrors :
2025-12-02 14:15:41 +00:00
- localai/localai-backends:master-nvidia-l4t-cuda-13-arm64-whisper
2025-07-20 20:54:12 +00:00
- !!merge << : *whispercpp
2025-07-20 20:52:45 +00:00
name : "cpu-whisper"
uri : "quay.io/go-skynet/local-ai-backends:latest-cpu-whisper"
2025-07-25 17:20:08 +00:00
mirrors :
- localai/localai-backends:latest-cpu-whisper
2025-09-01 20:30:35 +00:00
- !!merge << : *whispercpp
name : "metal-whisper"
uri : "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-whisper"
mirrors :
- localai/localai-backends:latest-metal-darwin-arm64-whisper
- !!merge << : *whispercpp
name : "metal-whisper-development"
uri : "quay.io/go-skynet/local-ai-backends:master-metal-darwin-arm64-whisper"
mirrors :
- localai/localai-backends:master-metal-darwin-arm64-whisper
2025-07-20 20:54:12 +00:00
- !!merge << : *whispercpp
2025-07-20 20:52:45 +00:00
name : "cpu-whisper-development"
uri : "quay.io/go-skynet/local-ai-backends:master-cpu-whisper"
2025-07-25 17:20:08 +00:00
mirrors :
- localai/localai-backends:master-cpu-whisper
2025-07-20 20:54:12 +00:00
- !!merge << : *whispercpp
2025-07-20 20:52:45 +00:00
name : "cuda12-whisper"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-12-whisper"
2025-07-25 17:20:08 +00:00
mirrors :
- localai/localai-backends:latest-gpu-nvidia-cuda-12-whisper
2025-07-20 20:54:12 +00:00
- !!merge << : *whispercpp
2025-07-20 20:52:45 +00:00
name : "rocm-whisper"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-rocm-hipblas-whisper"
2025-07-25 17:20:08 +00:00
mirrors :
- localai/localai-backends:latest-gpu-rocm-hipblas-whisper
2025-07-20 20:54:12 +00:00
- !!merge << : *whispercpp
2025-07-20 20:52:45 +00:00
name : "intel-sycl-f32-whisper"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-intel-sycl-f32-whisper"
2025-07-25 17:20:08 +00:00
mirrors :
- localai/localai-backends:latest-gpu-intel-sycl-f32-whisper
2025-07-20 20:54:12 +00:00
- !!merge << : *whispercpp
2025-07-20 20:52:45 +00:00
name : "intel-sycl-f16-whisper"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-intel-sycl-f16-whisper"
2025-07-25 17:20:08 +00:00
mirrors :
- localai/localai-backends:latest-gpu-intel-sycl-f16-whisper
2025-07-20 20:54:12 +00:00
- !!merge << : *whispercpp
2025-07-20 20:52:45 +00:00
name : "vulkan-whisper"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-vulkan-whisper"
2025-07-25 17:20:08 +00:00
mirrors :
- localai/localai-backends:latest-gpu-vulkan-whisper
2025-07-20 20:54:12 +00:00
- !!merge << : *whispercpp
2025-07-20 20:52:45 +00:00
name : "vulkan-whisper-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-vulkan-whisper"
2025-07-25 17:20:08 +00:00
mirrors :
- localai/localai-backends:master-gpu-vulkan-whisper
2025-07-20 20:54:12 +00:00
- !!merge << : *whispercpp
2025-07-20 20:52:45 +00:00
name : "metal-whisper"
uri : "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-whisper"
2025-07-25 17:20:08 +00:00
mirrors :
- localai/localai-backends:latest-metal-darwin-arm64-whisper
2025-07-20 20:54:12 +00:00
- !!merge << : *whispercpp
2025-07-20 20:52:45 +00:00
name : "metal-whisper-development"
uri : "quay.io/go-skynet/local-ai-backends:master-metal-darwin-arm64-whisper"
2025-07-25 17:20:08 +00:00
mirrors :
- localai/localai-backends:master-metal-darwin-arm64-whisper
2025-07-20 20:54:12 +00:00
- !!merge << : *whispercpp
2025-07-20 20:52:45 +00:00
name : "cuda12-whisper-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-12-whisper"
2025-07-25 17:20:08 +00:00
mirrors :
- localai/localai-backends:master-gpu-nvidia-cuda-12-whisper
2025-07-20 20:54:12 +00:00
- !!merge << : *whispercpp
2025-07-20 20:52:45 +00:00
name : "rocm-whisper-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-rocm-hipblas-whisper"
2025-07-25 17:20:08 +00:00
mirrors :
- localai/localai-backends:master-gpu-rocm-hipblas-whisper
2025-07-20 20:54:12 +00:00
- !!merge << : *whispercpp
2025-07-20 20:52:45 +00:00
name : "intel-sycl-f32-whisper-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-intel-sycl-f32-whisper"
2025-07-25 17:20:08 +00:00
mirrors :
- localai/localai-backends:master-gpu-intel-sycl-f32-whisper
2025-07-20 20:54:12 +00:00
- !!merge << : *whispercpp
2025-07-20 20:52:45 +00:00
name : "intel-sycl-f16-whisper-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-intel-sycl-f16-whisper"
2025-07-25 17:20:08 +00:00
mirrors :
- localai/localai-backends:master-gpu-intel-sycl-f16-whisper
2025-12-02 13:24:35 +00:00
- !!merge << : *whispercpp
name : "cuda13-whisper"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-13-whisper"
mirrors :
- localai/localai-backends:latest-gpu-nvidia-cuda-13-whisper
- !!merge << : *whispercpp
name : "cuda13-whisper-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-13-whisper"
mirrors :
- localai/localai-backends:master-gpu-nvidia-cuda-13-whisper
2025-07-19 19:58:53 +00:00
## stablediffusion-ggml
- !!merge << : *stablediffusionggml
name : "cpu-stablediffusion-ggml"
uri : "quay.io/go-skynet/local-ai-backends:latest-cpu-stablediffusion-ggml"
2025-07-25 17:20:08 +00:00
mirrors :
- localai/localai-backends:latest-cpu-stablediffusion-ggml
2025-07-19 19:58:53 +00:00
- !!merge << : *stablediffusionggml
name : "cpu-stablediffusion-ggml-development"
uri : "quay.io/go-skynet/local-ai-backends:master-cpu-stablediffusion-ggml"
2025-07-25 17:20:08 +00:00
mirrors :
- localai/localai-backends:master-cpu-stablediffusion-ggml
2025-09-01 20:30:35 +00:00
- !!merge << : *stablediffusionggml
name : "metal-stablediffusion-ggml"
uri : "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-stablediffusion-ggml"
mirrors :
- localai/localai-backends:latest-metal-darwin-arm64-stablediffusion-ggml
- !!merge << : *stablediffusionggml
name : "metal-stablediffusion-ggml-development"
uri : "quay.io/go-skynet/local-ai-backends:master-metal-darwin-arm64-stablediffusion-ggml"
mirrors :
- localai/localai-backends:master-metal-darwin-arm64-stablediffusion-ggml
2025-07-19 19:58:53 +00:00
- !!merge << : *stablediffusionggml
name : "vulkan-stablediffusion-ggml"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-vulkan-stablediffusion-ggml"
2025-07-25 17:20:08 +00:00
mirrors :
- localai/localai-backends:latest-gpu-vulkan-stablediffusion-ggml
2025-07-19 19:58:53 +00:00
- !!merge << : *stablediffusionggml
name : "vulkan-stablediffusion-ggml-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-vulkan-stablediffusion-ggml"
2025-07-25 17:20:08 +00:00
mirrors :
- localai/localai-backends:master-gpu-vulkan-stablediffusion-ggml
2025-07-19 19:58:53 +00:00
- !!merge << : *stablediffusionggml
name : "cuda12-stablediffusion-ggml"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-12-stablediffusion-ggml"
2025-07-25 17:20:08 +00:00
mirrors :
- localai/localai-backends:latest-gpu-nvidia-cuda-12-stablediffusion-ggml
2025-07-19 19:58:53 +00:00
- !!merge << : *stablediffusionggml
name : "intel-sycl-f32-stablediffusion-ggml"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-intel-sycl-f32-stablediffusion-ggml"
- !!merge << : *stablediffusionggml
name : "intel-sycl-f16-stablediffusion-ggml"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-intel-sycl-f16-stablediffusion-ggml"
2025-07-25 17:20:08 +00:00
mirrors :
- localai/localai-backends:latest-gpu-intel-sycl-f16-stablediffusion-ggml
2025-07-19 19:58:53 +00:00
- !!merge << : *stablediffusionggml
name : "cuda12-stablediffusion-ggml-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-12-stablediffusion-ggml"
2025-07-25 17:20:08 +00:00
mirrors :
- localai/localai-backends:master-gpu-nvidia-cuda-12-stablediffusion-ggml
2025-07-19 19:58:53 +00:00
- !!merge << : *stablediffusionggml
name : "intel-sycl-f32-stablediffusion-ggml-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-intel-sycl-f32-stablediffusion-ggml"
2025-07-25 17:20:08 +00:00
mirrors :
- localai/localai-backends:master-gpu-intel-sycl-f32-stablediffusion-ggml
2025-07-19 19:58:53 +00:00
- !!merge << : *stablediffusionggml
name : "intel-sycl-f16-stablediffusion-ggml-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-intel-sycl-f16-stablediffusion-ggml"
2025-07-25 17:20:08 +00:00
mirrors :
- localai/localai-backends:master-gpu-intel-sycl-f16-stablediffusion-ggml
2025-07-19 19:58:53 +00:00
- !!merge << : *stablediffusionggml
name : "nvidia-l4t-arm64-stablediffusion-ggml-development"
uri : "quay.io/go-skynet/local-ai-backends:master-nvidia-l4t-arm64-stablediffusion-ggml"
2025-07-25 17:20:08 +00:00
mirrors :
- localai/localai-backends:master-nvidia-l4t-arm64-stablediffusion-ggml
2025-07-19 19:58:53 +00:00
- !!merge << : *stablediffusionggml
name : "nvidia-l4t-arm64-stablediffusion-ggml"
uri : "quay.io/go-skynet/local-ai-backends:latest-nvidia-l4t-arm64-stablediffusion-ggml"
2025-07-25 17:20:08 +00:00
mirrors :
- localai/localai-backends:latest-nvidia-l4t-arm64-stablediffusion-ggml
2025-12-02 13:24:35 +00:00
- !!merge << : *stablediffusionggml
name : "cuda13-nvidia-l4t-arm64-stablediffusion-ggml"
2025-12-02 14:15:41 +00:00
uri : "quay.io/go-skynet/local-ai-backends:latest-nvidia-l4t-cuda-13-arm64-stablediffusion-ggml"
2025-12-02 13:24:35 +00:00
mirrors :
2025-12-02 14:15:41 +00:00
- localai/localai-backends:latest-nvidia-l4t-cuda-13-arm64-stablediffusion-ggml
2025-12-02 13:24:35 +00:00
- !!merge << : *stablediffusionggml
name : "cuda13-nvidia-l4t-arm64-stablediffusion-ggml-development"
2025-12-02 14:15:41 +00:00
uri : "quay.io/go-skynet/local-ai-backends:master-nvidia-l4t-cuda-13-arm64-stablediffusion-ggml"
2025-12-02 13:24:35 +00:00
mirrors :
2025-12-02 14:15:41 +00:00
- localai/localai-backends:master-nvidia-l4t-cuda-13-arm64-stablediffusion-ggml
2025-12-02 13:24:35 +00:00
- !!merge << : *stablediffusionggml
name : "cuda13-stablediffusion-ggml"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-13-stablediffusion-ggml"
mirrors :
- localai/localai-backends:latest-gpu-nvidia-cuda-13-stablediffusion-ggml
- !!merge << : *stablediffusionggml
name : "cuda13-stablediffusion-ggml-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-13-stablediffusion-ggml"
mirrors :
- localai/localai-backends:master-gpu-nvidia-cuda-13-stablediffusion-ggml
2025-07-18 11:24:12 +00:00
# vllm
2025-07-03 16:01:55 +00:00
- !!merge << : *vllm
name : "vllm-development"
capabilities :
nvidia : "cuda12-vllm-development"
amd : "rocm-vllm-development"
2025-07-28 13:15:19 +00:00
intel : "intel-vllm-development"
feat(vllm): parity with llama.cpp backend (#9328)
* fix(schema): serialize ToolCallID and Reasoning in Messages.ToProto
The ToProto conversion was dropping tool_call_id and reasoning_content
even though both proto and Go fields existed, breaking multi-turn tool
calling and reasoning passthrough to backends.
* refactor(config): introduce backend hook system and migrate llama-cpp defaults
Adds RegisterBackendHook/runBackendHooks so each backend can register
default-filling functions that run during ModelConfig.SetDefaults().
Migrates the existing GGUF guessing logic into hooks_llamacpp.go,
registered for both 'llama-cpp' and the empty backend (auto-detect).
Removes the old guesser.go shim.
* feat(config): add vLLM parser defaults hook and importer auto-detection
Introduces parser_defaults.json mapping model families to vLLM
tool_parser/reasoning_parser names, with longest-pattern-first matching.
The vllmDefaults hook auto-fills tool_parser and reasoning_parser
options at load time for known families, while the VLLMImporter writes
the same values into generated YAML so users can review and edit them.
Adds tests covering MatchParserDefaults, hook registration via
SetDefaults, and the user-override behavior.
* feat(vllm): wire native tool/reasoning parsers + chat deltas + logprobs
- Use vLLM's ToolParserManager/ReasoningParserManager to extract structured
output (tool calls, reasoning content) instead of reimplementing parsing
- Convert proto Messages to dicts and pass tools to apply_chat_template
- Emit ChatDelta with content/reasoning_content/tool_calls in Reply
- Extract prompt_tokens, completion_tokens, and logprobs from output
- Replace boolean GuidedDecoding with proper GuidedDecodingParams from Grammar
- Add TokenizeString and Free RPC methods
- Fix missing `time` import used by load_video()
* feat(vllm): CPU support + shared utils + vllm-omni feature parity
- Split vllm install per acceleration: move generic `vllm` out of
requirements-after.txt into per-profile after files (cublas12, hipblas,
intel) and add CPU wheel URL for cpu-after.txt
- requirements-cpu.txt now pulls torch==2.7.0+cpu from PyTorch CPU index
- backend/index.yaml: register cpu-vllm / cpu-vllm-development variants
- New backend/python/common/vllm_utils.py: shared parse_options,
messages_to_dicts, setup_parsers helpers (used by both vllm backends)
- vllm-omni: replace hardcoded chat template with tokenizer.apply_chat_template,
wire native parsers via shared utils, emit ChatDelta with token counts,
add TokenizeString and Free RPCs, detect CPU and set VLLM_TARGET_DEVICE
- Add test_cpu_inference.py: standalone script to validate CPU build with
a small model (Qwen2.5-0.5B-Instruct)
* fix(vllm): CPU build compatibility with vllm 0.14.1
Validated end-to-end on CPU with Qwen2.5-0.5B-Instruct (LoadModel, Predict,
TokenizeString, Free all working).
- requirements-cpu-after.txt: pin vllm to 0.14.1+cpu (pre-built wheel from
GitHub releases) for x86_64 and aarch64. vllm 0.14.1 is the newest CPU
wheel whose torch dependency resolves against published PyTorch builds
(torch==2.9.1+cpu). Later vllm CPU wheels currently require
torch==2.10.0+cpu which is only available on the PyTorch test channel
with incompatible torchvision.
- requirements-cpu.txt: bump torch to 2.9.1+cpu, add torchvision/torchaudio
so uv resolves them consistently from the PyTorch CPU index.
- install.sh: add --index-strategy=unsafe-best-match for CPU builds so uv
can mix the PyTorch index and PyPI for transitive deps (matches the
existing intel profile behaviour).
- backend.py LoadModel: vllm >= 0.14 removed AsyncLLMEngine.get_model_config
so the old code path errored out with AttributeError on model load.
Switch to the new get_tokenizer()/tokenizer accessor with a fallback
to building the tokenizer directly from request.Model.
* fix(vllm): tool parser constructor compat + e2e tool calling test
Concrete vLLM tool parsers override the abstract base's __init__ and
drop the tools kwarg (e.g. Hermes2ProToolParser only takes tokenizer).
Instantiating with tools= raised TypeError which was silently caught,
leaving chat_deltas.tool_calls empty.
Retry the constructor without the tools kwarg on TypeError — tools
aren't required by these parsers since extract_tool_calls finds tool
syntax in the raw model output directly.
Validated with Qwen/Qwen2.5-0.5B-Instruct + hermes parser on CPU:
the backend correctly returns ToolCallDelta{name='get_weather',
arguments='{"location": "Paris, France"}'} in ChatDelta.
test_tool_calls.py is a standalone smoke test that spawns the gRPC
backend, sends a chat completion with tools, and asserts the response
contains a structured tool call.
* ci(backend): build cpu-vllm container image
Add the cpu-vllm variant to the backend container build matrix so the
image registered in backend/index.yaml (cpu-vllm / cpu-vllm-development)
is actually produced by CI.
Follows the same pattern as the other CPU python backends
(cpu-diffusers, cpu-chatterbox, etc.) with build-type='' and no CUDA.
backend_pr.yml auto-picks this up via its matrix filter from backend.yml.
* test(e2e-backends): add tools capability + HF model name support
Extends tests/e2e-backends to cover backends that:
- Resolve HuggingFace model ids natively (vllm, vllm-omni) instead of
loading a local file: BACKEND_TEST_MODEL_NAME is passed verbatim as
ModelOptions.Model with no download/ModelFile.
- Parse tool calls into ChatDelta.tool_calls: new "tools" capability
sends a Predict with a get_weather function definition and asserts
the Reply contains a matching ToolCallDelta. Uses UseTokenizerTemplate
with OpenAI-style Messages so the backend can wire tools into the
model's chat template.
- Need backend-specific Options[]: BACKEND_TEST_OPTIONS lets a test set
e.g. "tool_parser:hermes,reasoning_parser:qwen3" at LoadModel time.
Adds make target test-extra-backend-vllm that:
- docker-build-vllm
- loads Qwen/Qwen2.5-0.5B-Instruct
- runs health,load,predict,stream,tools with tool_parser:hermes
Drops backend/python/vllm/test_{cpu_inference,tool_calls}.py — those
standalone scripts were scaffolding used while bringing up the Python
backend; the e2e-backends harness now covers the same ground uniformly
alongside llama-cpp and ik-llama-cpp.
* ci(test-extra): run vllm e2e tests on CPU
Adds tests-vllm-grpc to the test-extra workflow, mirroring the
llama-cpp and ik-llama-cpp gRPC jobs. Triggers when files under
backend/python/vllm/ change (or on run-all), builds the local-ai
vllm container image, and runs the tests/e2e-backends harness with
BACKEND_TEST_MODEL_NAME=Qwen/Qwen2.5-0.5B-Instruct, tool_parser:hermes,
and the tools capability enabled.
Uses ubuntu-latest (no GPU) — vllm runs on CPU via the cpu-vllm
wheel we pinned in requirements-cpu-after.txt. Frees disk space
before the build since the docker image + torch + vllm wheel is
sizeable.
* fix(vllm): build from source on CI to avoid SIGILL on prebuilt wheel
The prebuilt vllm 0.14.1+cpu wheel from GitHub releases is compiled with
SIMD instructions (AVX-512 VNNI/BF16 or AMX-BF16) that not every CPU
supports. GitHub Actions ubuntu-latest runners SIGILL when vllm spawns
the model_executor.models.registry subprocess for introspection, so
LoadModel never reaches the actual inference path.
- install.sh: when FROM_SOURCE=true on a CPU build, temporarily hide
requirements-cpu-after.txt so installRequirements installs the base
deps + torch CPU without pulling the prebuilt wheel, then clone vllm
and compile it with VLLM_TARGET_DEVICE=cpu. The resulting binaries
target the host's actual CPU.
- backend/Dockerfile.python: accept a FROM_SOURCE build-arg and expose
it as an ENV so install.sh sees it during `make`.
- Makefile docker-build-backend: forward FROM_SOURCE as --build-arg
when set, so backends that need source builds can opt in.
- Makefile test-extra-backend-vllm: call docker-build-vllm via a
recursive $(MAKE) invocation so FROM_SOURCE flows through.
- .github/workflows/test-extra.yml: set FROM_SOURCE=true on the
tests-vllm-grpc job. Slower but reliable — the prebuilt wheel only
works on hosts that share the build-time SIMD baseline.
Answers 'did you test locally?': yes, end-to-end on my local machine
with the prebuilt wheel (CPU supports AVX-512 VNNI). The CI runner CPU
gap was not covered locally — this commit plugs that gap.
* ci(vllm): use bigger-runner instead of source build
The prebuilt vllm 0.14.1+cpu wheel requires SIMD instructions (AVX-512
VNNI/BF16) that stock ubuntu-latest GitHub runners don't support —
vllm.model_executor.models.registry SIGILLs on import during LoadModel.
Source compilation works but takes 30-40 minutes per CI run, which is
too slow for an e2e smoke test. Instead, switch tests-vllm-grpc to the
bigger-runner self-hosted label (already used by backend.yml for the
llama-cpp CUDA build) — that hardware has the required SIMD baseline
and the prebuilt wheel runs cleanly.
FROM_SOURCE=true is kept as an opt-in escape hatch:
- install.sh still has the CPU source-build path for hosts that need it
- backend/Dockerfile.python still declares the ARG + ENV
- Makefile docker-build-backend still forwards the build-arg when set
Default CI path uses the fast prebuilt wheel; source build can be
re-enabled by exporting FROM_SOURCE=true in the environment.
* ci(vllm): install make + build deps on bigger-runner
bigger-runner is a bare self-hosted runner used by backend.yml for
docker image builds — it has docker but not the usual ubuntu-latest
toolchain. The make-based test target needs make, build-essential
(cgo in 'go test'), and curl/unzip (the Makefile protoc target
downloads protoc from github releases).
protoc-gen-go and protoc-gen-go-grpc come via 'go install' in the
install-go-tools target, which setup-go makes possible.
* ci(vllm): install libnuma1 + libgomp1 on bigger-runner
The vllm 0.14.1+cpu wheel ships a _C C++ extension that dlopens
libnuma.so.1 at import time. When the runner host doesn't have it,
the extension silently fails to register its torch ops, so
EngineCore crashes on init_device with:
AttributeError: '_OpNamespace' '_C_utils' object has no attribute
'init_cpu_threads_env'
Also add libgomp1 (OpenMP runtime, used by torch CPU kernels) to be
safe on stripped-down runners.
* feat(vllm): bundle libnuma/libgomp via package.sh
The vllm CPU wheel ships a _C extension that dlopens libnuma.so.1 at
import time; torch's CPU kernels in turn use libgomp.so.1 (OpenMP).
Without these on the host, vllm._C silently fails to register its
torch ops and EngineCore crashes with:
AttributeError: '_OpNamespace' '_C_utils' object has no attribute
'init_cpu_threads_env'
Rather than asking every user to install libnuma1/libgomp1 on their
host (or every LocalAI base image to ship them), bundle them into
the backend image itself — same pattern fish-speech and the GPU libs
already use. libbackend.sh adds ${EDIR}/lib to LD_LIBRARY_PATH at
run time so the bundled copies are picked up automatically.
- backend/python/vllm/package.sh (new): copies libnuma.so.1 and
libgomp.so.1 from the builder's multilib paths into ${BACKEND}/lib,
preserving soname symlinks. Runs during Dockerfile.python's
'Run backend-specific packaging' step (which already invokes
package.sh if present).
- backend/Dockerfile.python: install libnuma1 + libgomp1 in the
builder stage so package.sh has something to copy (the Ubuntu
base image otherwise only has libgomp in the gcc dep chain).
- test-extra.yml: drop the workaround that installed these libs on
the runner host — with the backend image self-contained, the
runner no longer needs them, and the test now exercises the
packaging path end-to-end the way a production host would.
* ci(vllm): disable tests-vllm-grpc job (heterogeneous runners)
Both ubuntu-latest and bigger-runner have inconsistent CPU baselines:
some instances support the AVX-512 VNNI/BF16 instructions the prebuilt
vllm 0.14.1+cpu wheel was compiled with, others SIGILL on import of
vllm.model_executor.models.registry. The libnuma packaging fix doesn't
help when the wheel itself can't be loaded.
FROM_SOURCE=true compiles vllm against the actual host CPU and works
everywhere, but takes 30-50 minutes per run — too slow for a smoke
test on every PR.
Comment out the job for now. The test itself is intact and passes
locally; run it via 'make test-extra-backend-vllm' on a host with the
required SIMD baseline. Re-enable when:
- we have a self-hosted runner label with guaranteed AVX-512 VNNI/BF16, or
- vllm publishes a CPU wheel with a wider baseline, or
- we set up a docker layer cache that makes FROM_SOURCE acceptable
The detect-changes vllm output, the test harness changes (tests/
e2e-backends + tools cap), the make target (test-extra-backend-vllm),
the package.sh and the Dockerfile/install.sh plumbing all stay in
place.
2026-04-13 09:00:29 +00:00
cpu : "cpu-vllm-development"
2025-06-17 15:31:53 +00:00
- !!merge << : *vllm
name : "cuda12-vllm"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-12-vllm"
2025-07-25 17:20:08 +00:00
mirrors :
- localai/localai-backends:latest-gpu-nvidia-cuda-12-vllm
2025-06-17 15:31:53 +00:00
- !!merge << : *vllm
name : "rocm-vllm"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-rocm-hipblas-vllm"
2025-07-25 17:20:08 +00:00
mirrors :
- localai/localai-backends:latest-gpu-rocm-hipblas-vllm
2025-06-17 15:31:53 +00:00
- !!merge << : *vllm
2025-07-28 13:15:19 +00:00
name : "intel-vllm"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-intel-vllm"
2025-07-25 17:20:08 +00:00
mirrors :
2025-07-28 13:15:19 +00:00
- localai/localai-backends:latest-gpu-intel-vllm
feat(vllm): parity with llama.cpp backend (#9328)
* fix(schema): serialize ToolCallID and Reasoning in Messages.ToProto
The ToProto conversion was dropping tool_call_id and reasoning_content
even though both proto and Go fields existed, breaking multi-turn tool
calling and reasoning passthrough to backends.
* refactor(config): introduce backend hook system and migrate llama-cpp defaults
Adds RegisterBackendHook/runBackendHooks so each backend can register
default-filling functions that run during ModelConfig.SetDefaults().
Migrates the existing GGUF guessing logic into hooks_llamacpp.go,
registered for both 'llama-cpp' and the empty backend (auto-detect).
Removes the old guesser.go shim.
* feat(config): add vLLM parser defaults hook and importer auto-detection
Introduces parser_defaults.json mapping model families to vLLM
tool_parser/reasoning_parser names, with longest-pattern-first matching.
The vllmDefaults hook auto-fills tool_parser and reasoning_parser
options at load time for known families, while the VLLMImporter writes
the same values into generated YAML so users can review and edit them.
Adds tests covering MatchParserDefaults, hook registration via
SetDefaults, and the user-override behavior.
* feat(vllm): wire native tool/reasoning parsers + chat deltas + logprobs
- Use vLLM's ToolParserManager/ReasoningParserManager to extract structured
output (tool calls, reasoning content) instead of reimplementing parsing
- Convert proto Messages to dicts and pass tools to apply_chat_template
- Emit ChatDelta with content/reasoning_content/tool_calls in Reply
- Extract prompt_tokens, completion_tokens, and logprobs from output
- Replace boolean GuidedDecoding with proper GuidedDecodingParams from Grammar
- Add TokenizeString and Free RPC methods
- Fix missing `time` import used by load_video()
* feat(vllm): CPU support + shared utils + vllm-omni feature parity
- Split vllm install per acceleration: move generic `vllm` out of
requirements-after.txt into per-profile after files (cublas12, hipblas,
intel) and add CPU wheel URL for cpu-after.txt
- requirements-cpu.txt now pulls torch==2.7.0+cpu from PyTorch CPU index
- backend/index.yaml: register cpu-vllm / cpu-vllm-development variants
- New backend/python/common/vllm_utils.py: shared parse_options,
messages_to_dicts, setup_parsers helpers (used by both vllm backends)
- vllm-omni: replace hardcoded chat template with tokenizer.apply_chat_template,
wire native parsers via shared utils, emit ChatDelta with token counts,
add TokenizeString and Free RPCs, detect CPU and set VLLM_TARGET_DEVICE
- Add test_cpu_inference.py: standalone script to validate CPU build with
a small model (Qwen2.5-0.5B-Instruct)
* fix(vllm): CPU build compatibility with vllm 0.14.1
Validated end-to-end on CPU with Qwen2.5-0.5B-Instruct (LoadModel, Predict,
TokenizeString, Free all working).
- requirements-cpu-after.txt: pin vllm to 0.14.1+cpu (pre-built wheel from
GitHub releases) for x86_64 and aarch64. vllm 0.14.1 is the newest CPU
wheel whose torch dependency resolves against published PyTorch builds
(torch==2.9.1+cpu). Later vllm CPU wheels currently require
torch==2.10.0+cpu which is only available on the PyTorch test channel
with incompatible torchvision.
- requirements-cpu.txt: bump torch to 2.9.1+cpu, add torchvision/torchaudio
so uv resolves them consistently from the PyTorch CPU index.
- install.sh: add --index-strategy=unsafe-best-match for CPU builds so uv
can mix the PyTorch index and PyPI for transitive deps (matches the
existing intel profile behaviour).
- backend.py LoadModel: vllm >= 0.14 removed AsyncLLMEngine.get_model_config
so the old code path errored out with AttributeError on model load.
Switch to the new get_tokenizer()/tokenizer accessor with a fallback
to building the tokenizer directly from request.Model.
* fix(vllm): tool parser constructor compat + e2e tool calling test
Concrete vLLM tool parsers override the abstract base's __init__ and
drop the tools kwarg (e.g. Hermes2ProToolParser only takes tokenizer).
Instantiating with tools= raised TypeError which was silently caught,
leaving chat_deltas.tool_calls empty.
Retry the constructor without the tools kwarg on TypeError — tools
aren't required by these parsers since extract_tool_calls finds tool
syntax in the raw model output directly.
Validated with Qwen/Qwen2.5-0.5B-Instruct + hermes parser on CPU:
the backend correctly returns ToolCallDelta{name='get_weather',
arguments='{"location": "Paris, France"}'} in ChatDelta.
test_tool_calls.py is a standalone smoke test that spawns the gRPC
backend, sends a chat completion with tools, and asserts the response
contains a structured tool call.
* ci(backend): build cpu-vllm container image
Add the cpu-vllm variant to the backend container build matrix so the
image registered in backend/index.yaml (cpu-vllm / cpu-vllm-development)
is actually produced by CI.
Follows the same pattern as the other CPU python backends
(cpu-diffusers, cpu-chatterbox, etc.) with build-type='' and no CUDA.
backend_pr.yml auto-picks this up via its matrix filter from backend.yml.
* test(e2e-backends): add tools capability + HF model name support
Extends tests/e2e-backends to cover backends that:
- Resolve HuggingFace model ids natively (vllm, vllm-omni) instead of
loading a local file: BACKEND_TEST_MODEL_NAME is passed verbatim as
ModelOptions.Model with no download/ModelFile.
- Parse tool calls into ChatDelta.tool_calls: new "tools" capability
sends a Predict with a get_weather function definition and asserts
the Reply contains a matching ToolCallDelta. Uses UseTokenizerTemplate
with OpenAI-style Messages so the backend can wire tools into the
model's chat template.
- Need backend-specific Options[]: BACKEND_TEST_OPTIONS lets a test set
e.g. "tool_parser:hermes,reasoning_parser:qwen3" at LoadModel time.
Adds make target test-extra-backend-vllm that:
- docker-build-vllm
- loads Qwen/Qwen2.5-0.5B-Instruct
- runs health,load,predict,stream,tools with tool_parser:hermes
Drops backend/python/vllm/test_{cpu_inference,tool_calls}.py — those
standalone scripts were scaffolding used while bringing up the Python
backend; the e2e-backends harness now covers the same ground uniformly
alongside llama-cpp and ik-llama-cpp.
* ci(test-extra): run vllm e2e tests on CPU
Adds tests-vllm-grpc to the test-extra workflow, mirroring the
llama-cpp and ik-llama-cpp gRPC jobs. Triggers when files under
backend/python/vllm/ change (or on run-all), builds the local-ai
vllm container image, and runs the tests/e2e-backends harness with
BACKEND_TEST_MODEL_NAME=Qwen/Qwen2.5-0.5B-Instruct, tool_parser:hermes,
and the tools capability enabled.
Uses ubuntu-latest (no GPU) — vllm runs on CPU via the cpu-vllm
wheel we pinned in requirements-cpu-after.txt. Frees disk space
before the build since the docker image + torch + vllm wheel is
sizeable.
* fix(vllm): build from source on CI to avoid SIGILL on prebuilt wheel
The prebuilt vllm 0.14.1+cpu wheel from GitHub releases is compiled with
SIMD instructions (AVX-512 VNNI/BF16 or AMX-BF16) that not every CPU
supports. GitHub Actions ubuntu-latest runners SIGILL when vllm spawns
the model_executor.models.registry subprocess for introspection, so
LoadModel never reaches the actual inference path.
- install.sh: when FROM_SOURCE=true on a CPU build, temporarily hide
requirements-cpu-after.txt so installRequirements installs the base
deps + torch CPU without pulling the prebuilt wheel, then clone vllm
and compile it with VLLM_TARGET_DEVICE=cpu. The resulting binaries
target the host's actual CPU.
- backend/Dockerfile.python: accept a FROM_SOURCE build-arg and expose
it as an ENV so install.sh sees it during `make`.
- Makefile docker-build-backend: forward FROM_SOURCE as --build-arg
when set, so backends that need source builds can opt in.
- Makefile test-extra-backend-vllm: call docker-build-vllm via a
recursive $(MAKE) invocation so FROM_SOURCE flows through.
- .github/workflows/test-extra.yml: set FROM_SOURCE=true on the
tests-vllm-grpc job. Slower but reliable — the prebuilt wheel only
works on hosts that share the build-time SIMD baseline.
Answers 'did you test locally?': yes, end-to-end on my local machine
with the prebuilt wheel (CPU supports AVX-512 VNNI). The CI runner CPU
gap was not covered locally — this commit plugs that gap.
* ci(vllm): use bigger-runner instead of source build
The prebuilt vllm 0.14.1+cpu wheel requires SIMD instructions (AVX-512
VNNI/BF16) that stock ubuntu-latest GitHub runners don't support —
vllm.model_executor.models.registry SIGILLs on import during LoadModel.
Source compilation works but takes 30-40 minutes per CI run, which is
too slow for an e2e smoke test. Instead, switch tests-vllm-grpc to the
bigger-runner self-hosted label (already used by backend.yml for the
llama-cpp CUDA build) — that hardware has the required SIMD baseline
and the prebuilt wheel runs cleanly.
FROM_SOURCE=true is kept as an opt-in escape hatch:
- install.sh still has the CPU source-build path for hosts that need it
- backend/Dockerfile.python still declares the ARG + ENV
- Makefile docker-build-backend still forwards the build-arg when set
Default CI path uses the fast prebuilt wheel; source build can be
re-enabled by exporting FROM_SOURCE=true in the environment.
* ci(vllm): install make + build deps on bigger-runner
bigger-runner is a bare self-hosted runner used by backend.yml for
docker image builds — it has docker but not the usual ubuntu-latest
toolchain. The make-based test target needs make, build-essential
(cgo in 'go test'), and curl/unzip (the Makefile protoc target
downloads protoc from github releases).
protoc-gen-go and protoc-gen-go-grpc come via 'go install' in the
install-go-tools target, which setup-go makes possible.
* ci(vllm): install libnuma1 + libgomp1 on bigger-runner
The vllm 0.14.1+cpu wheel ships a _C C++ extension that dlopens
libnuma.so.1 at import time. When the runner host doesn't have it,
the extension silently fails to register its torch ops, so
EngineCore crashes on init_device with:
AttributeError: '_OpNamespace' '_C_utils' object has no attribute
'init_cpu_threads_env'
Also add libgomp1 (OpenMP runtime, used by torch CPU kernels) to be
safe on stripped-down runners.
* feat(vllm): bundle libnuma/libgomp via package.sh
The vllm CPU wheel ships a _C extension that dlopens libnuma.so.1 at
import time; torch's CPU kernels in turn use libgomp.so.1 (OpenMP).
Without these on the host, vllm._C silently fails to register its
torch ops and EngineCore crashes with:
AttributeError: '_OpNamespace' '_C_utils' object has no attribute
'init_cpu_threads_env'
Rather than asking every user to install libnuma1/libgomp1 on their
host (or every LocalAI base image to ship them), bundle them into
the backend image itself — same pattern fish-speech and the GPU libs
already use. libbackend.sh adds ${EDIR}/lib to LD_LIBRARY_PATH at
run time so the bundled copies are picked up automatically.
- backend/python/vllm/package.sh (new): copies libnuma.so.1 and
libgomp.so.1 from the builder's multilib paths into ${BACKEND}/lib,
preserving soname symlinks. Runs during Dockerfile.python's
'Run backend-specific packaging' step (which already invokes
package.sh if present).
- backend/Dockerfile.python: install libnuma1 + libgomp1 in the
builder stage so package.sh has something to copy (the Ubuntu
base image otherwise only has libgomp in the gcc dep chain).
- test-extra.yml: drop the workaround that installed these libs on
the runner host — with the backend image self-contained, the
runner no longer needs them, and the test now exercises the
packaging path end-to-end the way a production host would.
* ci(vllm): disable tests-vllm-grpc job (heterogeneous runners)
Both ubuntu-latest and bigger-runner have inconsistent CPU baselines:
some instances support the AVX-512 VNNI/BF16 instructions the prebuilt
vllm 0.14.1+cpu wheel was compiled with, others SIGILL on import of
vllm.model_executor.models.registry. The libnuma packaging fix doesn't
help when the wheel itself can't be loaded.
FROM_SOURCE=true compiles vllm against the actual host CPU and works
everywhere, but takes 30-50 minutes per run — too slow for a smoke
test on every PR.
Comment out the job for now. The test itself is intact and passes
locally; run it via 'make test-extra-backend-vllm' on a host with the
required SIMD baseline. Re-enable when:
- we have a self-hosted runner label with guaranteed AVX-512 VNNI/BF16, or
- vllm publishes a CPU wheel with a wider baseline, or
- we set up a docker layer cache that makes FROM_SOURCE acceptable
The detect-changes vllm output, the test harness changes (tests/
e2e-backends + tools cap), the make target (test-extra-backend-vllm),
the package.sh and the Dockerfile/install.sh plumbing all stay in
place.
2026-04-13 09:00:29 +00:00
- !!merge << : *vllm
name : "cpu-vllm"
uri : "quay.io/go-skynet/local-ai-backends:latest-cpu-vllm"
mirrors :
- localai/localai-backends:latest-cpu-vllm
2025-06-17 15:31:53 +00:00
- !!merge << : *vllm
2025-06-18 17:48:50 +00:00
name : "cuda12-vllm-development"
2025-06-17 15:31:53 +00:00
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-12-vllm"
2025-07-25 17:20:08 +00:00
mirrors :
- localai/localai-backends:master-gpu-nvidia-cuda-12-vllm
2025-06-17 15:31:53 +00:00
- !!merge << : *vllm
2025-06-18 17:48:50 +00:00
name : "rocm-vllm-development"
2025-06-17 15:31:53 +00:00
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-rocm-hipblas-vllm"
2025-07-25 17:20:08 +00:00
mirrors :
- localai/localai-backends:master-gpu-rocm-hipblas-vllm
2025-06-17 15:31:53 +00:00
- !!merge << : *vllm
2025-07-28 13:15:19 +00:00
name : "intel-vllm-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-intel-vllm"
2025-07-25 17:20:08 +00:00
mirrors :
2025-07-28 13:15:19 +00:00
- localai/localai-backends:master-gpu-intel-vllm
feat(vllm): parity with llama.cpp backend (#9328)
* fix(schema): serialize ToolCallID and Reasoning in Messages.ToProto
The ToProto conversion was dropping tool_call_id and reasoning_content
even though both proto and Go fields existed, breaking multi-turn tool
calling and reasoning passthrough to backends.
* refactor(config): introduce backend hook system and migrate llama-cpp defaults
Adds RegisterBackendHook/runBackendHooks so each backend can register
default-filling functions that run during ModelConfig.SetDefaults().
Migrates the existing GGUF guessing logic into hooks_llamacpp.go,
registered for both 'llama-cpp' and the empty backend (auto-detect).
Removes the old guesser.go shim.
* feat(config): add vLLM parser defaults hook and importer auto-detection
Introduces parser_defaults.json mapping model families to vLLM
tool_parser/reasoning_parser names, with longest-pattern-first matching.
The vllmDefaults hook auto-fills tool_parser and reasoning_parser
options at load time for known families, while the VLLMImporter writes
the same values into generated YAML so users can review and edit them.
Adds tests covering MatchParserDefaults, hook registration via
SetDefaults, and the user-override behavior.
* feat(vllm): wire native tool/reasoning parsers + chat deltas + logprobs
- Use vLLM's ToolParserManager/ReasoningParserManager to extract structured
output (tool calls, reasoning content) instead of reimplementing parsing
- Convert proto Messages to dicts and pass tools to apply_chat_template
- Emit ChatDelta with content/reasoning_content/tool_calls in Reply
- Extract prompt_tokens, completion_tokens, and logprobs from output
- Replace boolean GuidedDecoding with proper GuidedDecodingParams from Grammar
- Add TokenizeString and Free RPC methods
- Fix missing `time` import used by load_video()
* feat(vllm): CPU support + shared utils + vllm-omni feature parity
- Split vllm install per acceleration: move generic `vllm` out of
requirements-after.txt into per-profile after files (cublas12, hipblas,
intel) and add CPU wheel URL for cpu-after.txt
- requirements-cpu.txt now pulls torch==2.7.0+cpu from PyTorch CPU index
- backend/index.yaml: register cpu-vllm / cpu-vllm-development variants
- New backend/python/common/vllm_utils.py: shared parse_options,
messages_to_dicts, setup_parsers helpers (used by both vllm backends)
- vllm-omni: replace hardcoded chat template with tokenizer.apply_chat_template,
wire native parsers via shared utils, emit ChatDelta with token counts,
add TokenizeString and Free RPCs, detect CPU and set VLLM_TARGET_DEVICE
- Add test_cpu_inference.py: standalone script to validate CPU build with
a small model (Qwen2.5-0.5B-Instruct)
* fix(vllm): CPU build compatibility with vllm 0.14.1
Validated end-to-end on CPU with Qwen2.5-0.5B-Instruct (LoadModel, Predict,
TokenizeString, Free all working).
- requirements-cpu-after.txt: pin vllm to 0.14.1+cpu (pre-built wheel from
GitHub releases) for x86_64 and aarch64. vllm 0.14.1 is the newest CPU
wheel whose torch dependency resolves against published PyTorch builds
(torch==2.9.1+cpu). Later vllm CPU wheels currently require
torch==2.10.0+cpu which is only available on the PyTorch test channel
with incompatible torchvision.
- requirements-cpu.txt: bump torch to 2.9.1+cpu, add torchvision/torchaudio
so uv resolves them consistently from the PyTorch CPU index.
- install.sh: add --index-strategy=unsafe-best-match for CPU builds so uv
can mix the PyTorch index and PyPI for transitive deps (matches the
existing intel profile behaviour).
- backend.py LoadModel: vllm >= 0.14 removed AsyncLLMEngine.get_model_config
so the old code path errored out with AttributeError on model load.
Switch to the new get_tokenizer()/tokenizer accessor with a fallback
to building the tokenizer directly from request.Model.
* fix(vllm): tool parser constructor compat + e2e tool calling test
Concrete vLLM tool parsers override the abstract base's __init__ and
drop the tools kwarg (e.g. Hermes2ProToolParser only takes tokenizer).
Instantiating with tools= raised TypeError which was silently caught,
leaving chat_deltas.tool_calls empty.
Retry the constructor without the tools kwarg on TypeError — tools
aren't required by these parsers since extract_tool_calls finds tool
syntax in the raw model output directly.
Validated with Qwen/Qwen2.5-0.5B-Instruct + hermes parser on CPU:
the backend correctly returns ToolCallDelta{name='get_weather',
arguments='{"location": "Paris, France"}'} in ChatDelta.
test_tool_calls.py is a standalone smoke test that spawns the gRPC
backend, sends a chat completion with tools, and asserts the response
contains a structured tool call.
* ci(backend): build cpu-vllm container image
Add the cpu-vllm variant to the backend container build matrix so the
image registered in backend/index.yaml (cpu-vllm / cpu-vllm-development)
is actually produced by CI.
Follows the same pattern as the other CPU python backends
(cpu-diffusers, cpu-chatterbox, etc.) with build-type='' and no CUDA.
backend_pr.yml auto-picks this up via its matrix filter from backend.yml.
* test(e2e-backends): add tools capability + HF model name support
Extends tests/e2e-backends to cover backends that:
- Resolve HuggingFace model ids natively (vllm, vllm-omni) instead of
loading a local file: BACKEND_TEST_MODEL_NAME is passed verbatim as
ModelOptions.Model with no download/ModelFile.
- Parse tool calls into ChatDelta.tool_calls: new "tools" capability
sends a Predict with a get_weather function definition and asserts
the Reply contains a matching ToolCallDelta. Uses UseTokenizerTemplate
with OpenAI-style Messages so the backend can wire tools into the
model's chat template.
- Need backend-specific Options[]: BACKEND_TEST_OPTIONS lets a test set
e.g. "tool_parser:hermes,reasoning_parser:qwen3" at LoadModel time.
Adds make target test-extra-backend-vllm that:
- docker-build-vllm
- loads Qwen/Qwen2.5-0.5B-Instruct
- runs health,load,predict,stream,tools with tool_parser:hermes
Drops backend/python/vllm/test_{cpu_inference,tool_calls}.py — those
standalone scripts were scaffolding used while bringing up the Python
backend; the e2e-backends harness now covers the same ground uniformly
alongside llama-cpp and ik-llama-cpp.
* ci(test-extra): run vllm e2e tests on CPU
Adds tests-vllm-grpc to the test-extra workflow, mirroring the
llama-cpp and ik-llama-cpp gRPC jobs. Triggers when files under
backend/python/vllm/ change (or on run-all), builds the local-ai
vllm container image, and runs the tests/e2e-backends harness with
BACKEND_TEST_MODEL_NAME=Qwen/Qwen2.5-0.5B-Instruct, tool_parser:hermes,
and the tools capability enabled.
Uses ubuntu-latest (no GPU) — vllm runs on CPU via the cpu-vllm
wheel we pinned in requirements-cpu-after.txt. Frees disk space
before the build since the docker image + torch + vllm wheel is
sizeable.
* fix(vllm): build from source on CI to avoid SIGILL on prebuilt wheel
The prebuilt vllm 0.14.1+cpu wheel from GitHub releases is compiled with
SIMD instructions (AVX-512 VNNI/BF16 or AMX-BF16) that not every CPU
supports. GitHub Actions ubuntu-latest runners SIGILL when vllm spawns
the model_executor.models.registry subprocess for introspection, so
LoadModel never reaches the actual inference path.
- install.sh: when FROM_SOURCE=true on a CPU build, temporarily hide
requirements-cpu-after.txt so installRequirements installs the base
deps + torch CPU without pulling the prebuilt wheel, then clone vllm
and compile it with VLLM_TARGET_DEVICE=cpu. The resulting binaries
target the host's actual CPU.
- backend/Dockerfile.python: accept a FROM_SOURCE build-arg and expose
it as an ENV so install.sh sees it during `make`.
- Makefile docker-build-backend: forward FROM_SOURCE as --build-arg
when set, so backends that need source builds can opt in.
- Makefile test-extra-backend-vllm: call docker-build-vllm via a
recursive $(MAKE) invocation so FROM_SOURCE flows through.
- .github/workflows/test-extra.yml: set FROM_SOURCE=true on the
tests-vllm-grpc job. Slower but reliable — the prebuilt wheel only
works on hosts that share the build-time SIMD baseline.
Answers 'did you test locally?': yes, end-to-end on my local machine
with the prebuilt wheel (CPU supports AVX-512 VNNI). The CI runner CPU
gap was not covered locally — this commit plugs that gap.
* ci(vllm): use bigger-runner instead of source build
The prebuilt vllm 0.14.1+cpu wheel requires SIMD instructions (AVX-512
VNNI/BF16) that stock ubuntu-latest GitHub runners don't support —
vllm.model_executor.models.registry SIGILLs on import during LoadModel.
Source compilation works but takes 30-40 minutes per CI run, which is
too slow for an e2e smoke test. Instead, switch tests-vllm-grpc to the
bigger-runner self-hosted label (already used by backend.yml for the
llama-cpp CUDA build) — that hardware has the required SIMD baseline
and the prebuilt wheel runs cleanly.
FROM_SOURCE=true is kept as an opt-in escape hatch:
- install.sh still has the CPU source-build path for hosts that need it
- backend/Dockerfile.python still declares the ARG + ENV
- Makefile docker-build-backend still forwards the build-arg when set
Default CI path uses the fast prebuilt wheel; source build can be
re-enabled by exporting FROM_SOURCE=true in the environment.
* ci(vllm): install make + build deps on bigger-runner
bigger-runner is a bare self-hosted runner used by backend.yml for
docker image builds — it has docker but not the usual ubuntu-latest
toolchain. The make-based test target needs make, build-essential
(cgo in 'go test'), and curl/unzip (the Makefile protoc target
downloads protoc from github releases).
protoc-gen-go and protoc-gen-go-grpc come via 'go install' in the
install-go-tools target, which setup-go makes possible.
* ci(vllm): install libnuma1 + libgomp1 on bigger-runner
The vllm 0.14.1+cpu wheel ships a _C C++ extension that dlopens
libnuma.so.1 at import time. When the runner host doesn't have it,
the extension silently fails to register its torch ops, so
EngineCore crashes on init_device with:
AttributeError: '_OpNamespace' '_C_utils' object has no attribute
'init_cpu_threads_env'
Also add libgomp1 (OpenMP runtime, used by torch CPU kernels) to be
safe on stripped-down runners.
* feat(vllm): bundle libnuma/libgomp via package.sh
The vllm CPU wheel ships a _C extension that dlopens libnuma.so.1 at
import time; torch's CPU kernels in turn use libgomp.so.1 (OpenMP).
Without these on the host, vllm._C silently fails to register its
torch ops and EngineCore crashes with:
AttributeError: '_OpNamespace' '_C_utils' object has no attribute
'init_cpu_threads_env'
Rather than asking every user to install libnuma1/libgomp1 on their
host (or every LocalAI base image to ship them), bundle them into
the backend image itself — same pattern fish-speech and the GPU libs
already use. libbackend.sh adds ${EDIR}/lib to LD_LIBRARY_PATH at
run time so the bundled copies are picked up automatically.
- backend/python/vllm/package.sh (new): copies libnuma.so.1 and
libgomp.so.1 from the builder's multilib paths into ${BACKEND}/lib,
preserving soname symlinks. Runs during Dockerfile.python's
'Run backend-specific packaging' step (which already invokes
package.sh if present).
- backend/Dockerfile.python: install libnuma1 + libgomp1 in the
builder stage so package.sh has something to copy (the Ubuntu
base image otherwise only has libgomp in the gcc dep chain).
- test-extra.yml: drop the workaround that installed these libs on
the runner host — with the backend image self-contained, the
runner no longer needs them, and the test now exercises the
packaging path end-to-end the way a production host would.
* ci(vllm): disable tests-vllm-grpc job (heterogeneous runners)
Both ubuntu-latest and bigger-runner have inconsistent CPU baselines:
some instances support the AVX-512 VNNI/BF16 instructions the prebuilt
vllm 0.14.1+cpu wheel was compiled with, others SIGILL on import of
vllm.model_executor.models.registry. The libnuma packaging fix doesn't
help when the wheel itself can't be loaded.
FROM_SOURCE=true compiles vllm against the actual host CPU and works
everywhere, but takes 30-50 minutes per run — too slow for a smoke
test on every PR.
Comment out the job for now. The test itself is intact and passes
locally; run it via 'make test-extra-backend-vllm' on a host with the
required SIMD baseline. Re-enable when:
- we have a self-hosted runner label with guaranteed AVX-512 VNNI/BF16, or
- vllm publishes a CPU wheel with a wider baseline, or
- we set up a docker layer cache that makes FROM_SOURCE acceptable
The detect-changes vllm output, the test harness changes (tests/
e2e-backends + tools cap), the make target (test-extra-backend-vllm),
the package.sh and the Dockerfile/install.sh plumbing all stay in
place.
2026-04-13 09:00:29 +00:00
- !!merge << : *vllm
name : "cpu-vllm-development"
uri : "quay.io/go-skynet/local-ai-backends:master-cpu-vllm"
mirrors :
- localai/localai-backends:master-cpu-vllm
2026-01-24 21:23:30 +00:00
# vllm-omni
- !!merge << : *vllm-omni
name : "vllm-omni-development"
capabilities :
nvidia : "cuda12-vllm-omni-development"
amd : "rocm-vllm-omni-development"
nvidia-cuda-12 : "cuda12-vllm-omni-development"
- !!merge << : *vllm-omni
name : "cuda12-vllm-omni"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-12-vllm-omni"
mirrors :
- localai/localai-backends:latest-gpu-nvidia-cuda-12-vllm-omni
- !!merge << : *vllm-omni
name : "rocm-vllm-omni"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-rocm-hipblas-vllm-omni"
mirrors :
- localai/localai-backends:latest-gpu-rocm-hipblas-vllm-omni
- !!merge << : *vllm-omni
name : "cuda12-vllm-omni-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-12-vllm-omni"
mirrors :
- localai/localai-backends:master-gpu-nvidia-cuda-12-vllm-omni
- !!merge << : *vllm-omni
name : "rocm-vllm-omni-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-rocm-hipblas-vllm-omni"
mirrors :
- localai/localai-backends:master-gpu-rocm-hipblas-vllm-omni
2025-07-27 20:02:51 +00:00
# rfdetr
- !!merge << : *rfdetr
name : "rfdetr-development"
capabilities :
nvidia : "cuda12-rfdetr-development"
intel : "intel-rfdetr-development"
#amd: "rocm-rfdetr-development"
nvidia-l4t : "nvidia-l4t-arm64-rfdetr-development"
2026-02-03 20:57:50 +00:00
metal : "metal-rfdetr-development"
2025-07-27 20:02:51 +00:00
default : "cpu-rfdetr-development"
2025-12-02 13:24:35 +00:00
nvidia-cuda-13 : "cuda13-rfdetr-development"
2025-07-27 20:02:51 +00:00
- !!merge << : *rfdetr
name : "cuda12-rfdetr"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-12-rfdetr"
mirrors :
- localai/localai-backends:latest-gpu-nvidia-cuda-12-rfdetr
- !!merge << : *rfdetr
name : "intel-rfdetr"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-intel-rfdetr"
mirrors :
- localai/localai-backends:latest-gpu-intel-rfdetr
# - !!merge <<: *rfdetr
# name: "rocm-rfdetr"
# uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-hipblas-rfdetr"
# mirrors:
# - localai/localai-backends:latest-gpu-hipblas-rfdetr
- !!merge << : *rfdetr
name : "nvidia-l4t-arm64-rfdetr"
uri : "quay.io/go-skynet/local-ai-backends:latest-nvidia-l4t-arm64-rfdetr"
mirrors :
- localai/localai-backends:latest-nvidia-l4t-arm64-rfdetr
2025-12-02 13:24:35 +00:00
- !!merge << : *rfdetr
name : "nvidia-l4t-arm64-rfdetr-development"
uri : "quay.io/go-skynet/local-ai-backends:master-nvidia-l4t-arm64-rfdetr"
mirrors :
- localai/localai-backends:master-nvidia-l4t-arm64-rfdetr
2025-07-27 20:02:51 +00:00
- !!merge << : *rfdetr
name : "cpu-rfdetr"
uri : "quay.io/go-skynet/local-ai-backends:latest-cpu-rfdetr"
mirrors :
- localai/localai-backends:latest-cpu-rfdetr
- !!merge << : *rfdetr
name : "cuda12-rfdetr-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-12-rfdetr"
mirrors :
- localai/localai-backends:master-gpu-nvidia-cuda-12-rfdetr
- !!merge << : *rfdetr
name : "intel-rfdetr-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-intel-rfdetr"
mirrors :
- localai/localai-backends:master-gpu-intel-rfdetr
# - !!merge <<: *rfdetr
# name: "rocm-rfdetr-development"
# uri: "quay.io/go-skynet/local-ai-backends:master-gpu-hipblas-rfdetr"
# mirrors:
# - localai/localai-backends:master-gpu-hipblas-rfdetr
- !!merge << : *rfdetr
name : "cpu-rfdetr-development"
uri : "quay.io/go-skynet/local-ai-backends:master-cpu-rfdetr"
mirrors :
- localai/localai-backends:master-cpu-rfdetr
- !!merge << : *rfdetr
name : "intel-rfdetr"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-intel-rfdetr"
mirrors :
- localai/localai-backends:latest-gpu-intel-rfdetr
2025-12-02 13:24:35 +00:00
- !!merge << : *rfdetr
name : "cuda13-rfdetr"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-13-rfdetr"
mirrors :
- localai/localai-backends:latest-gpu-nvidia-cuda-13-rfdetr
- !!merge << : *rfdetr
name : "cuda13-rfdetr-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-13-rfdetr"
mirrors :
- localai/localai-backends:master-gpu-nvidia-cuda-13-rfdetr
2026-02-03 20:57:50 +00:00
- !!merge << : *rfdetr
name : "metal-rfdetr"
uri : "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-rfdetr"
mirrors :
- localai/localai-backends:latest-metal-darwin-arm64-rfdetr
- !!merge << : *rfdetr
name : "metal-rfdetr-development"
uri : "quay.io/go-skynet/local-ai-backends:master-metal-darwin-arm64-rfdetr"
mirrors :
- localai/localai-backends:master-metal-darwin-arm64-rfdetr
2026-04-09 19:49:11 +00:00
## sam3-cpp
- !!merge << : *sam3cpp
name : "sam3-cpp-development"
capabilities :
default : "cpu-sam3-cpp-development"
nvidia : "cuda12-sam3-cpp-development"
nvidia-cuda-12 : "cuda12-sam3-cpp-development"
nvidia-cuda-13 : "cuda13-sam3-cpp-development"
nvidia-l4t : "nvidia-l4t-arm64-sam3-cpp-development"
nvidia-l4t-cuda-12 : "nvidia-l4t-arm64-sam3-cpp-development"
nvidia-l4t-cuda-13 : "cuda13-nvidia-l4t-arm64-sam3-cpp-development"
intel : "intel-sycl-f32-sam3-cpp-development"
vulkan : "vulkan-sam3-cpp-development"
- !!merge << : *sam3cpp
name : "cpu-sam3-cpp"
uri : "quay.io/go-skynet/local-ai-backends:latest-cpu-sam3-cpp"
mirrors :
- localai/localai-backends:latest-cpu-sam3-cpp
- !!merge << : *sam3cpp
name : "cpu-sam3-cpp-development"
uri : "quay.io/go-skynet/local-ai-backends:master-cpu-sam3-cpp"
mirrors :
- localai/localai-backends:master-cpu-sam3-cpp
- !!merge << : *sam3cpp
name : "cuda12-sam3-cpp"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-12-sam3-cpp"
mirrors :
- localai/localai-backends:latest-gpu-nvidia-cuda-12-sam3-cpp
- !!merge << : *sam3cpp
name : "cuda12-sam3-cpp-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-12-sam3-cpp"
mirrors :
- localai/localai-backends:master-gpu-nvidia-cuda-12-sam3-cpp
- !!merge << : *sam3cpp
name : "cuda13-sam3-cpp"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-13-sam3-cpp"
mirrors :
- localai/localai-backends:latest-gpu-nvidia-cuda-13-sam3-cpp
- !!merge << : *sam3cpp
name : "cuda13-sam3-cpp-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-13-sam3-cpp"
mirrors :
- localai/localai-backends:master-gpu-nvidia-cuda-13-sam3-cpp
- !!merge << : *sam3cpp
name : "nvidia-l4t-arm64-sam3-cpp"
uri : "quay.io/go-skynet/local-ai-backends:latest-nvidia-l4t-arm64-sam3-cpp"
mirrors :
- localai/localai-backends:latest-nvidia-l4t-arm64-sam3-cpp
- !!merge << : *sam3cpp
name : "nvidia-l4t-arm64-sam3-cpp-development"
uri : "quay.io/go-skynet/local-ai-backends:master-nvidia-l4t-arm64-sam3-cpp"
mirrors :
- localai/localai-backends:master-nvidia-l4t-arm64-sam3-cpp
- !!merge << : *sam3cpp
name : "cuda13-nvidia-l4t-arm64-sam3-cpp"
uri : "quay.io/go-skynet/local-ai-backends:latest-nvidia-l4t-cuda-13-arm64-sam3-cpp"
mirrors :
- localai/localai-backends:latest-nvidia-l4t-cuda-13-arm64-sam3-cpp
- !!merge << : *sam3cpp
name : "cuda13-nvidia-l4t-arm64-sam3-cpp-development"
uri : "quay.io/go-skynet/local-ai-backends:master-nvidia-l4t-cuda-13-arm64-sam3-cpp"
mirrors :
- localai/localai-backends:master-nvidia-l4t-cuda-13-arm64-sam3-cpp
- !!merge << : *sam3cpp
name : "intel-sycl-f32-sam3-cpp"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-intel-sycl-f32-sam3-cpp"
mirrors :
- localai/localai-backends:latest-gpu-intel-sycl-f32-sam3-cpp
- !!merge << : *sam3cpp
name : "intel-sycl-f32-sam3-cpp-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-intel-sycl-f32-sam3-cpp"
mirrors :
- localai/localai-backends:master-gpu-intel-sycl-f32-sam3-cpp
- !!merge << : *sam3cpp
name : "vulkan-sam3-cpp"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-vulkan-sam3-cpp"
mirrors :
- localai/localai-backends:latest-gpu-vulkan-sam3-cpp
- !!merge << : *sam3cpp
name : "vulkan-sam3-cpp-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-vulkan-sam3-cpp"
mirrors :
- localai/localai-backends:master-gpu-vulkan-sam3-cpp
2025-06-17 15:31:53 +00:00
## Rerankers
2025-07-03 16:01:55 +00:00
- !!merge << : *rerankers
name : "rerankers-development"
capabilities :
nvidia : "cuda12-rerankers-development"
2025-07-28 13:15:19 +00:00
intel : "intel-rerankers-development"
2025-07-03 16:01:55 +00:00
amd : "rocm-rerankers-development"
2026-02-03 20:57:50 +00:00
metal : "metal-rerankers-development"
2025-12-02 13:24:35 +00:00
nvidia-cuda-13 : "cuda13-rerankers-development"
2025-07-03 16:01:55 +00:00
- !!merge << : *rerankers
name : "cuda12-rerankers"
2025-06-15 12:56:52 +00:00
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-12-rerankers"
2025-07-25 17:20:08 +00:00
mirrors :
- localai/localai-backends:latest-gpu-nvidia-cuda-12-rerankers
2025-07-03 16:01:55 +00:00
- !!merge << : *rerankers
2025-07-28 13:15:19 +00:00
name : "intel-rerankers"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-intel-rerankers"
2025-07-25 17:20:08 +00:00
mirrors :
2025-07-28 13:15:19 +00:00
- localai/localai-backends:latest-gpu-intel-rerankers
2025-07-03 16:01:55 +00:00
- !!merge << : *rerankers
name : "rocm-rerankers"
2025-06-17 15:31:53 +00:00
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-rocm-hipblas-rerankers"
2025-07-25 17:20:08 +00:00
mirrors :
- localai/localai-backends:latest-gpu-rocm-hipblas-rerankers
2025-07-03 16:01:55 +00:00
- !!merge << : *rerankers
name : "cuda12-rerankers-development"
2025-06-17 15:31:53 +00:00
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-12-rerankers"
2025-07-25 17:20:08 +00:00
mirrors :
- localai/localai-backends:master-gpu-nvidia-cuda-12-rerankers
2025-07-03 16:01:55 +00:00
- !!merge << : *rerankers
name : "rocm-rerankers-development"
2025-06-17 15:31:53 +00:00
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-rocm-hipblas-rerankers"
2025-07-25 17:20:08 +00:00
mirrors :
- localai/localai-backends:master-gpu-rocm-hipblas-rerankers
2025-07-03 16:01:55 +00:00
- !!merge << : *rerankers
2025-07-28 13:15:19 +00:00
name : "intel-rerankers-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-intel-rerankers"
2025-07-25 17:20:08 +00:00
mirrors :
2025-07-28 13:15:19 +00:00
- localai/localai-backends:master-gpu-intel-rerankers
2025-12-02 13:24:35 +00:00
- !!merge << : *rerankers
name : "cuda13-rerankers"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-13-rerankers"
mirrors :
- localai/localai-backends:latest-gpu-nvidia-cuda-13-rerankers
- !!merge << : *rerankers
name : "cuda13-rerankers-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-13-rerankers"
mirrors :
- localai/localai-backends:master-gpu-nvidia-cuda-13-rerankers
2026-02-03 20:57:50 +00:00
- !!merge << : *rerankers
name : "metal-rerankers"
uri : "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-rerankers"
mirrors :
- localai/localai-backends:latest-metal-darwin-arm64-rerankers
- !!merge << : *rerankers
name : "metal-rerankers-development"
uri : "quay.io/go-skynet/local-ai-backends:master-metal-darwin-arm64-rerankers"
mirrors :
- localai/localai-backends:master-metal-darwin-arm64-rerankers
2025-06-17 15:31:53 +00:00
## Transformers
2025-07-03 16:01:55 +00:00
- !!merge << : *transformers
name : "transformers-development"
capabilities :
nvidia : "cuda12-transformers-development"
2025-07-28 13:15:19 +00:00
intel : "intel-transformers-development"
2025-07-03 16:01:55 +00:00
amd : "rocm-transformers-development"
2026-02-03 20:57:50 +00:00
metal : "metal-transformers-development"
2025-12-02 13:24:35 +00:00
nvidia-cuda-13 : "cuda13-transformers-development"
2025-07-03 16:01:55 +00:00
- !!merge << : *transformers
name : "cuda12-transformers"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-12-transformers"
2025-07-25 17:20:08 +00:00
mirrors :
- localai/localai-backends:latest-gpu-nvidia-cuda-12-transformers
2025-06-17 20:21:44 +00:00
- !!merge << : *transformers
name : "rocm-transformers"
2025-06-17 15:31:53 +00:00
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-rocm-hipblas-transformers"
2025-07-25 17:20:08 +00:00
mirrors :
- localai/localai-backends:latest-gpu-rocm-hipblas-transformers
2025-06-17 20:21:44 +00:00
- !!merge << : *transformers
2025-07-28 13:15:19 +00:00
name : "intel-transformers"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-intel-transformers"
2025-07-25 17:20:08 +00:00
mirrors :
2025-07-28 13:15:19 +00:00
- localai/localai-backends:latest-gpu-intel-transformers
2025-06-17 20:21:44 +00:00
- !!merge << : *transformers
2025-06-18 17:48:50 +00:00
name : "cuda12-transformers-development"
2025-06-15 12:56:52 +00:00
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-12-transformers"
2025-07-25 17:20:08 +00:00
mirrors :
- localai/localai-backends:master-gpu-nvidia-cuda-12-transformers
2025-06-17 20:21:44 +00:00
- !!merge << : *transformers
2025-06-18 17:48:50 +00:00
name : "rocm-transformers-development"
2025-06-15 12:56:52 +00:00
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-rocm-hipblas-transformers"
2025-07-25 17:20:08 +00:00
mirrors :
- localai/localai-backends:master-gpu-rocm-hipblas-transformers
2025-06-17 20:21:44 +00:00
- !!merge << : *transformers
2025-07-28 13:15:19 +00:00
name : "intel-transformers-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-intel-transformers"
2025-07-25 17:20:08 +00:00
mirrors :
2025-07-28 13:15:19 +00:00
- localai/localai-backends:master-gpu-intel-transformers
2025-12-02 13:24:35 +00:00
- !!merge << : *transformers
name : "cuda13-transformers"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-13-transformers"
mirrors :
- localai/localai-backends:latest-gpu-nvidia-cuda-13-transformers
- !!merge << : *transformers
name : "cuda13-transformers-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-13-transformers"
mirrors :
- localai/localai-backends:master-gpu-nvidia-cuda-13-transformers
2026-02-03 20:57:50 +00:00
- !!merge << : *transformers
name : "metal-transformers"
uri : "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-transformers"
mirrors :
- localai/localai-backends:latest-metal-darwin-arm64-transformers
- !!merge << : *transformers
name : "metal-transformers-development"
uri : "quay.io/go-skynet/local-ai-backends:master-metal-darwin-arm64-transformers"
mirrors :
- localai/localai-backends:master-metal-darwin-arm64-transformers
2025-06-17 15:31:53 +00:00
## Diffusers
2025-07-03 16:01:55 +00:00
- !!merge << : *diffusers
name : "diffusers-development"
capabilities :
nvidia : "cuda12-diffusers-development"
2025-07-28 13:15:19 +00:00
intel : "intel-diffusers-development"
2025-07-03 16:01:55 +00:00
amd : "rocm-diffusers-development"
2025-08-08 20:48:38 +00:00
nvidia-l4t : "nvidia-l4t-diffusers-development"
2025-08-22 21:14:54 +00:00
metal : "metal-diffusers-development"
2025-08-24 08:17:10 +00:00
default : "cpu-diffusers-development"
2025-12-02 13:24:35 +00:00
nvidia-cuda-13 : "cuda13-diffusers-development"
2025-08-24 08:17:10 +00:00
- !!merge << : *diffusers
name : "cpu-diffusers"
uri : "quay.io/go-skynet/local-ai-backends:latest-cpu-diffusers"
mirrors :
- localai/localai-backends:latest-cpu-diffusers
- !!merge << : *diffusers
name : "cpu-diffusers-development"
uri : "quay.io/go-skynet/local-ai-backends:master-cpu-diffusers"
mirrors :
- localai/localai-backends:master-cpu-diffusers
2025-08-08 20:48:38 +00:00
- !!merge << : *diffusers
name : "nvidia-l4t-diffusers"
2025-12-02 14:15:41 +00:00
uri : "quay.io/go-skynet/local-ai-backends:latest-nvidia-l4t-diffusers"
2025-08-08 20:48:38 +00:00
mirrors :
2025-12-02 14:15:41 +00:00
- localai/localai-backends:latest-nvidia-l4t-diffusers
2025-08-08 20:48:38 +00:00
- !!merge << : *diffusers
name : "nvidia-l4t-diffusers-development"
2025-12-02 14:15:41 +00:00
uri : "quay.io/go-skynet/local-ai-backends:master-nvidia-l4t-diffusers"
2025-08-08 20:48:38 +00:00
mirrors :
2025-12-02 14:15:41 +00:00
- localai/localai-backends:master-nvidia-l4t-diffusers
2025-12-02 13:24:35 +00:00
- !!merge << : *diffusers
name : "cuda13-nvidia-l4t-arm64-diffusers"
2025-12-02 14:15:41 +00:00
uri : "quay.io/go-skynet/local-ai-backends:latest-nvidia-l4t-cuda-13-arm64-diffusers"
2025-12-02 13:24:35 +00:00
mirrors :
2025-12-02 14:15:41 +00:00
- localai/localai-backends:latest-nvidia-l4t-cuda-13-arm64-diffusers
2025-12-02 13:24:35 +00:00
- !!merge << : *diffusers
name : "cuda13-nvidia-l4t-arm64-diffusers-development"
2025-12-02 14:15:41 +00:00
uri : "quay.io/go-skynet/local-ai-backends:master-nvidia-l4t-cuda-13-arm64-diffusers"
2025-12-02 13:24:35 +00:00
mirrors :
2025-12-02 14:15:41 +00:00
- localai/localai-backends:master-nvidia-l4t-cuda-13-arm64-diffusers
2025-07-03 16:01:55 +00:00
- !!merge << : *diffusers
name : "cuda12-diffusers"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-12-diffusers"
2025-07-25 17:20:08 +00:00
mirrors :
- localai/localai-backends:latest-gpu-nvidia-cuda-12-diffusers
2025-06-17 20:21:44 +00:00
- !!merge << : *diffusers
name : "rocm-diffusers"
2025-06-17 15:31:53 +00:00
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-rocm-hipblas-diffusers"
2025-07-25 17:20:08 +00:00
mirrors :
- localai/localai-backends:latest-gpu-rocm-hipblas-diffusers
2025-06-17 20:21:44 +00:00
- !!merge << : *diffusers
2025-07-28 13:15:19 +00:00
name : "intel-diffusers"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-intel-diffusers"
2025-07-25 17:20:08 +00:00
mirrors :
2025-07-28 13:15:19 +00:00
- localai/localai-backends:latest-gpu-intel-diffusers
2025-06-17 20:21:44 +00:00
- !!merge << : *diffusers
2025-06-18 17:48:50 +00:00
name : "cuda12-diffusers-development"
2025-06-17 15:31:53 +00:00
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-12-diffusers"
2025-07-25 17:20:08 +00:00
mirrors :
- localai/localai-backends:master-gpu-nvidia-cuda-12-diffusers
2025-06-17 20:21:44 +00:00
- !!merge << : *diffusers
2025-06-18 17:48:50 +00:00
name : "rocm-diffusers-development"
2025-06-17 15:31:53 +00:00
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-rocm-hipblas-diffusers"
2025-07-25 17:20:08 +00:00
mirrors :
- localai/localai-backends:master-gpu-rocm-hipblas-diffusers
2025-06-17 20:21:44 +00:00
- !!merge << : *diffusers
2025-07-28 13:15:19 +00:00
name : "intel-diffusers-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-intel-diffusers"
2025-07-25 17:20:08 +00:00
mirrors :
2025-07-28 13:15:19 +00:00
- localai/localai-backends:master-gpu-intel-diffusers
2025-12-02 13:24:35 +00:00
- !!merge << : *diffusers
name : "cuda13-diffusers"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-13-diffusers"
mirrors :
- localai/localai-backends:latest-gpu-nvidia-cuda-13-diffusers
- !!merge << : *diffusers
name : "cuda13-diffusers-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-13-diffusers"
mirrors :
- localai/localai-backends:master-gpu-nvidia-cuda-13-diffusers
2025-08-22 21:14:54 +00:00
- !!merge << : *diffusers
name : "metal-diffusers"
uri : "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-diffusers"
mirrors :
- localai/localai-backends:latest-metal-darwin-arm64-diffusers
- !!merge << : *diffusers
name : "metal-diffusers-development"
uri : "quay.io/go-skynet/local-ai-backends:master-metal-darwin-arm64-diffusers"
mirrors :
- localai/localai-backends:master-metal-darwin-arm64-diffusers
2026-02-05 11:04:53 +00:00
## ace-step
- !!merge << : *ace-step
name : "cpu-ace-step"
uri : "quay.io/go-skynet/local-ai-backends:latest-cpu-ace-step"
mirrors :
- localai/localai-backends:latest-cpu-ace-step
- !!merge << : *ace-step
name : "cpu-ace-step-development"
uri : "quay.io/go-skynet/local-ai-backends:master-cpu-ace-step"
mirrors :
- localai/localai-backends:master-cpu-ace-step
- !!merge << : *ace-step
name : "cuda12-ace-step"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-12-ace-step"
mirrors :
- localai/localai-backends:latest-gpu-nvidia-cuda-12-ace-step
- !!merge << : *ace-step
name : "cuda12-ace-step-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-12-ace-step"
mirrors :
- localai/localai-backends:master-gpu-nvidia-cuda-12-ace-step
- !!merge << : *ace-step
name : "cuda13-ace-step"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-13-ace-step"
mirrors :
- localai/localai-backends:latest-gpu-nvidia-cuda-13-ace-step
- !!merge << : *ace-step
name : "cuda13-ace-step-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-13-ace-step"
mirrors :
- localai/localai-backends:master-gpu-nvidia-cuda-13-ace-step
- !!merge << : *ace-step
name : "rocm-ace-step"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-rocm-hipblas-ace-step"
mirrors :
- localai/localai-backends:latest-gpu-rocm-hipblas-ace-step
- !!merge << : *ace-step
name : "rocm-ace-step-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-rocm-hipblas-ace-step"
mirrors :
- localai/localai-backends:master-gpu-rocm-hipblas-ace-step
- !!merge << : *ace-step
name : "intel-ace-step"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-intel-ace-step"
mirrors :
- localai/localai-backends:latest-gpu-intel-ace-step
- !!merge << : *ace-step
name : "intel-ace-step-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-intel-ace-step"
mirrors :
- localai/localai-backends:master-gpu-intel-ace-step
- !!merge << : *ace-step
name : "metal-ace-step"
uri : "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-ace-step"
mirrors :
- localai/localai-backends:latest-metal-darwin-arm64-ace-step
- !!merge << : *ace-step
name : "metal-ace-step-development"
uri : "quay.io/go-skynet/local-ai-backends:master-metal-darwin-arm64-ace-step"
mirrors :
- localai/localai-backends:master-metal-darwin-arm64-ace-step
2026-03-12 17:56:26 +00:00
## acestep-cpp
- !!merge << : *acestepcpp
name : "nvidia-l4t-arm64-acestep-cpp"
uri : "quay.io/go-skynet/local-ai-backends:latest-nvidia-l4t-arm64-acestep-cpp"
mirrors :
- localai/localai-backends:latest-nvidia-l4t-arm64-acestep-cpp
- !!merge << : *acestepcpp
name : "nvidia-l4t-arm64-acestep-cpp-development"
uri : "quay.io/go-skynet/local-ai-backends:master-nvidia-l4t-arm64-acestep-cpp"
mirrors :
- localai/localai-backends:master-nvidia-l4t-arm64-acestep-cpp
- !!merge << : *acestepcpp
name : "cuda13-nvidia-l4t-arm64-acestep-cpp"
uri : "quay.io/go-skynet/local-ai-backends:latest-nvidia-l4t-cuda-13-arm64-acestep-cpp"
mirrors :
- localai/localai-backends:latest-nvidia-l4t-cuda-13-arm64-acestep-cpp
- !!merge << : *acestepcpp
name : "cuda13-nvidia-l4t-arm64-acestep-cpp-development"
uri : "quay.io/go-skynet/local-ai-backends:master-nvidia-l4t-cuda-13-arm64-acestep-cpp"
mirrors :
- localai/localai-backends:master-nvidia-l4t-cuda-13-arm64-acestep-cpp
- !!merge << : *acestepcpp
name : "cpu-acestep-cpp"
uri : "quay.io/go-skynet/local-ai-backends:latest-cpu-acestep-cpp"
mirrors :
- localai/localai-backends:latest-cpu-acestep-cpp
- !!merge << : *acestepcpp
name : "metal-acestep-cpp"
uri : "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-acestep-cpp"
mirrors :
- localai/localai-backends:latest-metal-darwin-arm64-acestep-cpp
- !!merge << : *acestepcpp
name : "metal-acestep-cpp-development"
uri : "quay.io/go-skynet/local-ai-backends:master-metal-darwin-arm64-acestep-cpp"
mirrors :
- localai/localai-backends:master-metal-darwin-arm64-acestep-cpp
- !!merge << : *acestepcpp
name : "cpu-acestep-cpp-development"
uri : "quay.io/go-skynet/local-ai-backends:master-cpu-acestep-cpp"
mirrors :
- localai/localai-backends:master-cpu-acestep-cpp
- !!merge << : *acestepcpp
name : "cuda12-acestep-cpp"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-12-acestep-cpp"
mirrors :
- localai/localai-backends:latest-gpu-nvidia-cuda-12-acestep-cpp
- !!merge << : *acestepcpp
name : "rocm-acestep-cpp"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-rocm-hipblas-acestep-cpp"
mirrors :
- localai/localai-backends:latest-gpu-rocm-hipblas-acestep-cpp
- !!merge << : *acestepcpp
name : "intel-sycl-f32-acestep-cpp"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-intel-sycl-f32-acestep-cpp"
mirrors :
- localai/localai-backends:latest-gpu-intel-sycl-f32-acestep-cpp
- !!merge << : *acestepcpp
name : "intel-sycl-f16-acestep-cpp"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-intel-sycl-f16-acestep-cpp"
mirrors :
- localai/localai-backends:latest-gpu-intel-sycl-f16-acestep-cpp
- !!merge << : *acestepcpp
name : "vulkan-acestep-cpp"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-vulkan-acestep-cpp"
mirrors :
- localai/localai-backends:latest-gpu-vulkan-acestep-cpp
- !!merge << : *acestepcpp
name : "vulkan-acestep-cpp-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-vulkan-acestep-cpp"
mirrors :
- localai/localai-backends:master-gpu-vulkan-acestep-cpp
- !!merge << : *acestepcpp
name : "cuda12-acestep-cpp-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-12-acestep-cpp"
mirrors :
- localai/localai-backends:master-gpu-nvidia-cuda-12-acestep-cpp
- !!merge << : *acestepcpp
name : "rocm-acestep-cpp-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-rocm-hipblas-acestep-cpp"
mirrors :
- localai/localai-backends:master-gpu-rocm-hipblas-acestep-cpp
- !!merge << : *acestepcpp
name : "intel-sycl-f32-acestep-cpp-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-intel-sycl-f32-acestep-cpp"
mirrors :
- localai/localai-backends:master-gpu-intel-sycl-f32-acestep-cpp
- !!merge << : *acestepcpp
name : "intel-sycl-f16-acestep-cpp-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-intel-sycl-f16-acestep-cpp"
mirrors :
- localai/localai-backends:master-gpu-intel-sycl-f16-acestep-cpp
- !!merge << : *acestepcpp
name : "cuda13-acestep-cpp"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-13-acestep-cpp"
mirrors :
- localai/localai-backends:latest-gpu-nvidia-cuda-13-acestep-cpp
- !!merge << : *acestepcpp
name : "cuda13-acestep-cpp-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-13-acestep-cpp"
mirrors :
- localai/localai-backends:master-gpu-nvidia-cuda-13-acestep-cpp
2026-04-11 21:14:26 +00:00
## qwen3-tts-cpp
- !!merge << : *qwen3ttscpp
name : "nvidia-l4t-arm64-qwen3-tts-cpp"
uri : "quay.io/go-skynet/local-ai-backends:latest-nvidia-l4t-arm64-qwen3-tts-cpp"
mirrors :
- localai/localai-backends:latest-nvidia-l4t-arm64-qwen3-tts-cpp
- !!merge << : *qwen3ttscpp
name : "nvidia-l4t-arm64-qwen3-tts-cpp-development"
uri : "quay.io/go-skynet/local-ai-backends:master-nvidia-l4t-arm64-qwen3-tts-cpp"
mirrors :
- localai/localai-backends:master-nvidia-l4t-arm64-qwen3-tts-cpp
- !!merge << : *qwen3ttscpp
name : "cuda13-nvidia-l4t-arm64-qwen3-tts-cpp"
uri : "quay.io/go-skynet/local-ai-backends:latest-nvidia-l4t-cuda-13-arm64-qwen3-tts-cpp"
mirrors :
- localai/localai-backends:latest-nvidia-l4t-cuda-13-arm64-qwen3-tts-cpp
- !!merge << : *qwen3ttscpp
name : "cuda13-nvidia-l4t-arm64-qwen3-tts-cpp-development"
uri : "quay.io/go-skynet/local-ai-backends:master-nvidia-l4t-cuda-13-arm64-qwen3-tts-cpp"
mirrors :
- localai/localai-backends:master-nvidia-l4t-cuda-13-arm64-qwen3-tts-cpp
- !!merge << : *qwen3ttscpp
name : "cpu-qwen3-tts-cpp"
uri : "quay.io/go-skynet/local-ai-backends:latest-cpu-qwen3-tts-cpp"
mirrors :
- localai/localai-backends:latest-cpu-qwen3-tts-cpp
- !!merge << : *qwen3ttscpp
name : "metal-qwen3-tts-cpp"
uri : "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-qwen3-tts-cpp"
mirrors :
- localai/localai-backends:latest-metal-darwin-arm64-qwen3-tts-cpp
- !!merge << : *qwen3ttscpp
name : "metal-qwen3-tts-cpp-development"
uri : "quay.io/go-skynet/local-ai-backends:master-metal-darwin-arm64-qwen3-tts-cpp"
mirrors :
- localai/localai-backends:master-metal-darwin-arm64-qwen3-tts-cpp
- !!merge << : *qwen3ttscpp
name : "cpu-qwen3-tts-cpp-development"
uri : "quay.io/go-skynet/local-ai-backends:master-cpu-qwen3-tts-cpp"
mirrors :
- localai/localai-backends:master-cpu-qwen3-tts-cpp
- !!merge << : *qwen3ttscpp
name : "cuda12-qwen3-tts-cpp"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-12-qwen3-tts-cpp"
mirrors :
- localai/localai-backends:latest-gpu-nvidia-cuda-12-qwen3-tts-cpp
- !!merge << : *qwen3ttscpp
name : "rocm-qwen3-tts-cpp"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-rocm-hipblas-qwen3-tts-cpp"
mirrors :
- localai/localai-backends:latest-gpu-rocm-hipblas-qwen3-tts-cpp
- !!merge << : *qwen3ttscpp
name : "intel-sycl-f32-qwen3-tts-cpp"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-intel-sycl-f32-qwen3-tts-cpp"
mirrors :
- localai/localai-backends:latest-gpu-intel-sycl-f32-qwen3-tts-cpp
- !!merge << : *qwen3ttscpp
name : "intel-sycl-f16-qwen3-tts-cpp"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-intel-sycl-f16-qwen3-tts-cpp"
mirrors :
- localai/localai-backends:latest-gpu-intel-sycl-f16-qwen3-tts-cpp
- !!merge << : *qwen3ttscpp
name : "vulkan-qwen3-tts-cpp"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-vulkan-qwen3-tts-cpp"
mirrors :
- localai/localai-backends:latest-gpu-vulkan-qwen3-tts-cpp
- !!merge << : *qwen3ttscpp
name : "vulkan-qwen3-tts-cpp-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-vulkan-qwen3-tts-cpp"
mirrors :
- localai/localai-backends:master-gpu-vulkan-qwen3-tts-cpp
- !!merge << : *qwen3ttscpp
name : "cuda12-qwen3-tts-cpp-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-12-qwen3-tts-cpp"
mirrors :
- localai/localai-backends:master-gpu-nvidia-cuda-12-qwen3-tts-cpp
- !!merge << : *qwen3ttscpp
name : "rocm-qwen3-tts-cpp-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-rocm-hipblas-qwen3-tts-cpp"
mirrors :
- localai/localai-backends:master-gpu-rocm-hipblas-qwen3-tts-cpp
- !!merge << : *qwen3ttscpp
name : "intel-sycl-f32-qwen3-tts-cpp-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-intel-sycl-f32-qwen3-tts-cpp"
mirrors :
- localai/localai-backends:master-gpu-intel-sycl-f32-qwen3-tts-cpp
- !!merge << : *qwen3ttscpp
name : "intel-sycl-f16-qwen3-tts-cpp-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-intel-sycl-f16-qwen3-tts-cpp"
mirrors :
- localai/localai-backends:master-gpu-intel-sycl-f16-qwen3-tts-cpp
- !!merge << : *qwen3ttscpp
name : "cuda13-qwen3-tts-cpp"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-13-qwen3-tts-cpp"
mirrors :
- localai/localai-backends:latest-gpu-nvidia-cuda-13-qwen3-tts-cpp
- !!merge << : *qwen3ttscpp
name : "cuda13-qwen3-tts-cpp-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-13-qwen3-tts-cpp"
mirrors :
- localai/localai-backends:master-gpu-nvidia-cuda-13-qwen3-tts-cpp
2025-06-17 15:31:53 +00:00
## kokoro
2025-07-03 16:01:55 +00:00
- !!merge << : *kokoro
name : "kokoro-development"
capabilities :
nvidia : "cuda12-kokoro-development"
2025-07-28 13:15:19 +00:00
intel : "intel-kokoro-development"
2025-07-03 16:01:55 +00:00
amd : "rocm-kokoro-development"
2025-09-22 08:33:26 +00:00
nvidia-l4t : "nvidia-l4t-kokoro-development"
2026-02-03 20:57:50 +00:00
metal : "metal-kokoro-development"
2025-06-17 20:21:44 +00:00
- !!merge << : *kokoro
2025-06-18 17:48:50 +00:00
name : "cuda12-kokoro-development"
2025-06-15 12:56:52 +00:00
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-12-kokoro"
2025-07-25 17:20:08 +00:00
mirrors :
- localai/localai-backends:master-gpu-nvidia-cuda-12-kokoro
2025-06-17 20:21:44 +00:00
- !!merge << : *kokoro
2025-06-18 17:48:50 +00:00
name : "rocm-kokoro-development"
2025-06-15 12:56:52 +00:00
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-rocm-hipblas-kokoro"
2025-07-25 17:20:08 +00:00
mirrors :
- localai/localai-backends:master-gpu-rocm-hipblas-kokoro
2025-06-17 20:21:44 +00:00
- !!merge << : *kokoro
2025-07-28 13:15:19 +00:00
name : "intel-kokoro"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-intel-kokoro"
2025-07-25 17:20:08 +00:00
mirrors :
2025-07-28 13:15:19 +00:00
- localai/localai-backends:latest-gpu-intel-kokoro
2025-06-17 20:21:44 +00:00
- !!merge << : *kokoro
2025-07-28 13:15:19 +00:00
name : "intel-kokoro-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-intel-kokoro"
2025-07-25 17:20:08 +00:00
mirrors :
2025-07-28 13:15:19 +00:00
- localai/localai-backends:master-gpu-intel-kokoro
2025-09-22 08:33:26 +00:00
- !!merge << : *kokoro
name : "nvidia-l4t-kokoro"
2025-12-02 14:15:41 +00:00
uri : "quay.io/go-skynet/local-ai-backends:latest-nvidia-l4t-kokoro"
2025-09-22 08:33:26 +00:00
mirrors :
2025-12-02 14:15:41 +00:00
- localai/localai-backends:latest-nvidia-l4t-kokoro
2025-09-22 08:33:26 +00:00
- !!merge << : *kokoro
name : "nvidia-l4t-kokoro-development"
2025-12-02 14:15:41 +00:00
uri : "quay.io/go-skynet/local-ai-backends:master-nvidia-l4t-kokoro"
2025-09-22 08:33:26 +00:00
mirrors :
2025-12-02 14:15:41 +00:00
- localai/localai-backends:master-nvidia-l4t-kokoro
2025-07-03 16:01:55 +00:00
- !!merge << : *kokoro
name : "cuda12-kokoro"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-12-kokoro"
2025-07-25 17:20:08 +00:00
mirrors :
- localai/localai-backends:latest-gpu-nvidia-cuda-12-kokoro
2025-07-03 16:01:55 +00:00
- !!merge << : *kokoro
name : "rocm-kokoro"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-rocm-hipblas-kokoro"
2025-07-25 17:20:08 +00:00
mirrors :
- localai/localai-backends:latest-gpu-rocm-hipblas-kokoro
2025-12-02 13:24:35 +00:00
- !!merge << : *kokoro
name : "cuda13-kokoro"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-13-kokoro"
mirrors :
- localai/localai-backends:latest-gpu-nvidia-cuda-13-kokoro
- !!merge << : *kokoro
name : "cuda13-kokoro-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-13-kokoro"
mirrors :
- localai/localai-backends:master-gpu-nvidia-cuda-13-kokoro
2026-02-03 20:57:50 +00:00
- !!merge << : *kokoro
name : "metal-kokoro"
uri : "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-kokoro"
mirrors :
- localai/localai-backends:latest-metal-darwin-arm64-kokoro
- !!merge << : *kokoro
name : "metal-kokoro-development"
uri : "quay.io/go-skynet/local-ai-backends:master-metal-darwin-arm64-kokoro"
mirrors :
- localai/localai-backends:master-metal-darwin-arm64-kokoro
2026-04-08 17:23:16 +00:00
## kokoros (Rust)
- !!merge << : *kokoros
name : "kokoros-development"
capabilities :
default : "cpu-kokoros-development"
- !!merge << : *kokoros
name : "cpu-kokoros"
uri : "quay.io/go-skynet/local-ai-backends:latest-cpu-kokoros"
mirrors :
- localai/localai-backends:latest-cpu-kokoros
- !!merge << : *kokoros
name : "cpu-kokoros-development"
uri : "quay.io/go-skynet/local-ai-backends:master-cpu-kokoros"
mirrors :
- localai/localai-backends:master-cpu-kokoros
2025-06-17 15:31:53 +00:00
## faster-whisper
2025-07-03 16:01:55 +00:00
- !!merge << : *faster-whisper
name : "faster-whisper-development"
capabilities :
2026-04-08 19:23:38 +00:00
default : "cpu-faster-whisper-development"
2025-07-03 16:01:55 +00:00
nvidia : "cuda12-faster-whisper-development"
2025-07-28 13:15:19 +00:00
intel : "intel-faster-whisper-development"
2025-07-03 16:01:55 +00:00
amd : "rocm-faster-whisper-development"
2026-02-03 20:57:50 +00:00
metal : "metal-faster-whisper-development"
2025-12-02 13:24:35 +00:00
nvidia-cuda-13 : "cuda13-faster-whisper-development"
2026-04-08 19:23:38 +00:00
nvidia-l4t : "nvidia-l4t-arm64-faster-whisper-development"
2025-06-17 20:21:44 +00:00
- !!merge << : *faster-whisper
2025-06-18 17:48:50 +00:00
name : "cuda12-faster-whisper-development"
2025-06-15 12:56:52 +00:00
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-12-faster-whisper"
2025-07-25 17:20:08 +00:00
mirrors :
- localai/localai-backends:master-gpu-nvidia-cuda-12-faster-whisper
2025-06-17 20:21:44 +00:00
- !!merge << : *faster-whisper
2025-06-18 17:48:50 +00:00
name : "rocm-faster-whisper-development"
2025-06-15 12:56:52 +00:00
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-rocm-hipblas-faster-whisper"
2025-07-25 17:20:08 +00:00
mirrors :
- localai/localai-backends:master-gpu-rocm-hipblas-faster-whisper
2025-06-17 20:21:44 +00:00
- !!merge << : *faster-whisper
2025-07-28 13:15:19 +00:00
name : "intel-faster-whisper"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-intel-faster-whisper"
2025-07-25 17:20:08 +00:00
mirrors :
2025-07-28 13:15:19 +00:00
- localai/localai-backends:latest-gpu-intel-faster-whisper
2025-06-17 20:21:44 +00:00
- !!merge << : *faster-whisper
2025-07-28 13:15:19 +00:00
name : "intel-faster-whisper-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-intel-faster-whisper"
2025-07-25 17:20:08 +00:00
mirrors :
2025-07-28 13:15:19 +00:00
- localai/localai-backends:master-gpu-intel-faster-whisper
2025-12-02 13:24:35 +00:00
- !!merge << : *faster-whisper
name : "cuda13-faster-whisper"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-13-faster-whisper"
mirrors :
- localai/localai-backends:latest-gpu-nvidia-cuda-13-faster-whisper
- !!merge << : *faster-whisper
name : "cuda13-faster-whisper-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-13-faster-whisper"
mirrors :
- localai/localai-backends:master-gpu-nvidia-cuda-13-faster-whisper
2026-02-03 20:57:50 +00:00
- !!merge << : *faster-whisper
name : "metal-faster-whisper"
uri : "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-faster-whisper"
mirrors :
- localai/localai-backends:latest-metal-darwin-arm64-faster-whisper
- !!merge << : *faster-whisper
name : "metal-faster-whisper-development"
uri : "quay.io/go-skynet/local-ai-backends:master-metal-darwin-arm64-faster-whisper"
mirrors :
- localai/localai-backends:master-metal-darwin-arm64-faster-whisper
2026-04-08 19:23:38 +00:00
- !!merge << : *faster-whisper
name : "cuda12-faster-whisper"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-12-faster-whisper"
mirrors :
- localai/localai-backends:latest-gpu-nvidia-cuda-12-faster-whisper
- !!merge << : *faster-whisper
name : "rocm-faster-whisper"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-rocm-hipblas-faster-whisper"
mirrors :
- localai/localai-backends:latest-gpu-rocm-hipblas-faster-whisper
- !!merge << : *faster-whisper
name : "cpu-faster-whisper"
uri : "quay.io/go-skynet/local-ai-backends:latest-cpu-faster-whisper"
mirrors :
- localai/localai-backends:latest-cpu-faster-whisper
- !!merge << : *faster-whisper
name : "cpu-faster-whisper-development"
uri : "quay.io/go-skynet/local-ai-backends:master-cpu-faster-whisper"
mirrors :
- localai/localai-backends:master-cpu-faster-whisper
- !!merge << : *faster-whisper
name : "nvidia-l4t-arm64-faster-whisper"
uri : "quay.io/go-skynet/local-ai-backends:latest-nvidia-l4t-faster-whisper"
mirrors :
- localai/localai-backends:latest-nvidia-l4t-faster-whisper
- !!merge << : *faster-whisper
name : "nvidia-l4t-arm64-faster-whisper-development"
uri : "quay.io/go-skynet/local-ai-backends:master-nvidia-l4t-faster-whisper"
mirrors :
- localai/localai-backends:master-nvidia-l4t-faster-whisper
2026-01-07 20:44:35 +00:00
## moonshine
- !!merge << : *moonshine
name : "moonshine-development"
capabilities :
nvidia : "cuda12-moonshine-development"
default : "cpu-moonshine-development"
nvidia-cuda-13 : "cuda13-moonshine-development"
nvidia-cuda-12 : "cuda12-moonshine-development"
- !!merge << : *moonshine
name : "cpu-moonshine"
uri : "quay.io/go-skynet/local-ai-backends:latest-cpu-moonshine"
mirrors :
- localai/localai-backends:latest-cpu-moonshine
- !!merge << : *moonshine
name : "cpu-moonshine-development"
uri : "quay.io/go-skynet/local-ai-backends:master-cpu-moonshine"
mirrors :
- localai/localai-backends:master-cpu-moonshine
- !!merge << : *moonshine
name : "cuda12-moonshine"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-12-moonshine"
mirrors :
- localai/localai-backends:latest-gpu-nvidia-cuda-12-moonshine
- !!merge << : *moonshine
name : "cuda12-moonshine-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-12-moonshine"
mirrors :
- localai/localai-backends:master-gpu-nvidia-cuda-12-moonshine
- !!merge << : *moonshine
name : "cuda13-moonshine"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-13-moonshine"
mirrors :
- localai/localai-backends:latest-gpu-nvidia-cuda-13-moonshine
- !!merge << : *moonshine
name : "cuda13-moonshine-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-13-moonshine"
mirrors :
- localai/localai-backends:master-gpu-nvidia-cuda-13-moonshine
2026-02-03 20:57:50 +00:00
- !!merge << : *moonshine
name : "metal-moonshine"
uri : "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-moonshine"
mirrors :
- localai/localai-backends:latest-metal-darwin-arm64-moonshine
- !!merge << : *moonshine
name : "metal-moonshine-development"
uri : "quay.io/go-skynet/local-ai-backends:master-metal-darwin-arm64-moonshine"
mirrors :
- localai/localai-backends:master-metal-darwin-arm64-moonshine
feat(whisperx): add whisperx backend for transcription with speaker diarization (#8299)
* feat(proto): add speaker field to TranscriptSegment for diarization
Add speaker field to the gRPC TranscriptSegment message and map it
through the Go schema, enabling backends to return speaker labels.
Signed-off-by: eureka928 <meobius123@gmail.com>
* feat(whisperx): add whisperx backend for transcription with diarization
Add Python gRPC backend using WhisperX for speech-to-text with
word-level timestamps, forced alignment, and speaker diarization
via pyannote-audio when HF_TOKEN is provided.
Signed-off-by: eureka928 <meobius123@gmail.com>
* feat(whisperx): register whisperx backend in Makefile
Signed-off-by: eureka928 <meobius123@gmail.com>
* feat(whisperx): add whisperx meta and image entries to index.yaml
Signed-off-by: eureka928 <meobius123@gmail.com>
* ci(whisperx): add build matrix entries for CPU, CUDA 12/13, and ROCm
Signed-off-by: eureka928 <meobius123@gmail.com>
* fix(whisperx): unpin torch versions and use CPU index for cpu requirements
Address review feedback:
- Use --extra-index-url for CPU torch wheels to reduce size
- Remove torch version pins, let uv resolve compatible versions
Signed-off-by: eureka928 <meobius123@gmail.com>
* fix(whisperx): pin torch ROCm variant to fix CI build failure
Signed-off-by: eureka928 <meobius123@gmail.com>
* fix(whisperx): pin torch CPU variant to fix uv resolution failure
Pin torch==2.8.0+cpu so uv resolves the CPU wheel from the extra
index instead of picking torch==2.8.0+cu128 from PyPI, which pulls
unresolvable CUDA dependencies.
Signed-off-by: eureka928 <meobius123@gmail.com>
* fix(whisperx): use unsafe-best-match index strategy to fix uv resolution failure
uv's default first-match strategy finds torch on PyPI before checking
the extra index, causing it to pick torch==2.8.0+cu128 instead of the
CPU variant. This makes whisperx's transitive torch dependency
unresolvable. Using unsafe-best-match lets uv consider all indexes.
Signed-off-by: eureka928 <meobius123@gmail.com>
* fix(whisperx): drop +cpu local version suffix to fix uv resolution failure
PEP 440 ==2.8.0 matches 2.8.0+cpu from the extra index, avoiding the
issue where uv cannot locate an explicit +cpu local version specifier.
This aligns with the pattern used by all other CPU backends.
Signed-off-by: eureka928 <meobius123@gmail.com>
* fix(backends): drop +rocm local version suffixes from hipblas requirements to fix uv resolution
uv cannot resolve PEP 440 local version specifiers (e.g. +rocm6.4,
+rocm6.3) in pinned requirements. The --extra-index-url already points
to the correct ROCm wheel index and --index-strategy unsafe-best-match
(set in libbackend.sh) ensures the ROCm variant is preferred.
Applies the same fix as 7f5d72e8 (which resolved this for +cpu) across
all 14 hipblas requirements files.
Signed-off-by: eureka928 <meobius123@gmail.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Signed-off-by: eureka928 <meobius123@gmail.com>
* revert: scope hipblas suffix fix to whisperx only
Reverts changes to non-whisperx hipblas requirements files per
maintainer review — other backends are building fine with the +rocm
local version suffix.
Signed-off-by: eureka928 <meobius123@gmail.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Signed-off-by: eureka928 <meobius123@gmail.com>
---------
Signed-off-by: eureka928 <meobius123@gmail.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-02 15:33:12 +00:00
## whisperx
- !!merge << : *whisperx
name : "whisperx-development"
capabilities :
nvidia : "cuda12-whisperx-development"
amd : "rocm-whisperx-development"
2026-02-03 20:57:50 +00:00
metal : "metal-whisperx-development"
feat(whisperx): add whisperx backend for transcription with speaker diarization (#8299)
* feat(proto): add speaker field to TranscriptSegment for diarization
Add speaker field to the gRPC TranscriptSegment message and map it
through the Go schema, enabling backends to return speaker labels.
Signed-off-by: eureka928 <meobius123@gmail.com>
* feat(whisperx): add whisperx backend for transcription with diarization
Add Python gRPC backend using WhisperX for speech-to-text with
word-level timestamps, forced alignment, and speaker diarization
via pyannote-audio when HF_TOKEN is provided.
Signed-off-by: eureka928 <meobius123@gmail.com>
* feat(whisperx): register whisperx backend in Makefile
Signed-off-by: eureka928 <meobius123@gmail.com>
* feat(whisperx): add whisperx meta and image entries to index.yaml
Signed-off-by: eureka928 <meobius123@gmail.com>
* ci(whisperx): add build matrix entries for CPU, CUDA 12/13, and ROCm
Signed-off-by: eureka928 <meobius123@gmail.com>
* fix(whisperx): unpin torch versions and use CPU index for cpu requirements
Address review feedback:
- Use --extra-index-url for CPU torch wheels to reduce size
- Remove torch version pins, let uv resolve compatible versions
Signed-off-by: eureka928 <meobius123@gmail.com>
* fix(whisperx): pin torch ROCm variant to fix CI build failure
Signed-off-by: eureka928 <meobius123@gmail.com>
* fix(whisperx): pin torch CPU variant to fix uv resolution failure
Pin torch==2.8.0+cpu so uv resolves the CPU wheel from the extra
index instead of picking torch==2.8.0+cu128 from PyPI, which pulls
unresolvable CUDA dependencies.
Signed-off-by: eureka928 <meobius123@gmail.com>
* fix(whisperx): use unsafe-best-match index strategy to fix uv resolution failure
uv's default first-match strategy finds torch on PyPI before checking
the extra index, causing it to pick torch==2.8.0+cu128 instead of the
CPU variant. This makes whisperx's transitive torch dependency
unresolvable. Using unsafe-best-match lets uv consider all indexes.
Signed-off-by: eureka928 <meobius123@gmail.com>
* fix(whisperx): drop +cpu local version suffix to fix uv resolution failure
PEP 440 ==2.8.0 matches 2.8.0+cpu from the extra index, avoiding the
issue where uv cannot locate an explicit +cpu local version specifier.
This aligns with the pattern used by all other CPU backends.
Signed-off-by: eureka928 <meobius123@gmail.com>
* fix(backends): drop +rocm local version suffixes from hipblas requirements to fix uv resolution
uv cannot resolve PEP 440 local version specifiers (e.g. +rocm6.4,
+rocm6.3) in pinned requirements. The --extra-index-url already points
to the correct ROCm wheel index and --index-strategy unsafe-best-match
(set in libbackend.sh) ensures the ROCm variant is preferred.
Applies the same fix as 7f5d72e8 (which resolved this for +cpu) across
all 14 hipblas requirements files.
Signed-off-by: eureka928 <meobius123@gmail.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Signed-off-by: eureka928 <meobius123@gmail.com>
* revert: scope hipblas suffix fix to whisperx only
Reverts changes to non-whisperx hipblas requirements files per
maintainer review — other backends are building fine with the +rocm
local version suffix.
Signed-off-by: eureka928 <meobius123@gmail.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Signed-off-by: eureka928 <meobius123@gmail.com>
---------
Signed-off-by: eureka928 <meobius123@gmail.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-02 15:33:12 +00:00
default : "cpu-whisperx-development"
nvidia-cuda-13 : "cuda13-whisperx-development"
nvidia-cuda-12 : "cuda12-whisperx-development"
2026-04-08 19:23:38 +00:00
nvidia-l4t : "nvidia-l4t-arm64-whisperx-development"
feat(whisperx): add whisperx backend for transcription with speaker diarization (#8299)
* feat(proto): add speaker field to TranscriptSegment for diarization
Add speaker field to the gRPC TranscriptSegment message and map it
through the Go schema, enabling backends to return speaker labels.
Signed-off-by: eureka928 <meobius123@gmail.com>
* feat(whisperx): add whisperx backend for transcription with diarization
Add Python gRPC backend using WhisperX for speech-to-text with
word-level timestamps, forced alignment, and speaker diarization
via pyannote-audio when HF_TOKEN is provided.
Signed-off-by: eureka928 <meobius123@gmail.com>
* feat(whisperx): register whisperx backend in Makefile
Signed-off-by: eureka928 <meobius123@gmail.com>
* feat(whisperx): add whisperx meta and image entries to index.yaml
Signed-off-by: eureka928 <meobius123@gmail.com>
* ci(whisperx): add build matrix entries for CPU, CUDA 12/13, and ROCm
Signed-off-by: eureka928 <meobius123@gmail.com>
* fix(whisperx): unpin torch versions and use CPU index for cpu requirements
Address review feedback:
- Use --extra-index-url for CPU torch wheels to reduce size
- Remove torch version pins, let uv resolve compatible versions
Signed-off-by: eureka928 <meobius123@gmail.com>
* fix(whisperx): pin torch ROCm variant to fix CI build failure
Signed-off-by: eureka928 <meobius123@gmail.com>
* fix(whisperx): pin torch CPU variant to fix uv resolution failure
Pin torch==2.8.0+cpu so uv resolves the CPU wheel from the extra
index instead of picking torch==2.8.0+cu128 from PyPI, which pulls
unresolvable CUDA dependencies.
Signed-off-by: eureka928 <meobius123@gmail.com>
* fix(whisperx): use unsafe-best-match index strategy to fix uv resolution failure
uv's default first-match strategy finds torch on PyPI before checking
the extra index, causing it to pick torch==2.8.0+cu128 instead of the
CPU variant. This makes whisperx's transitive torch dependency
unresolvable. Using unsafe-best-match lets uv consider all indexes.
Signed-off-by: eureka928 <meobius123@gmail.com>
* fix(whisperx): drop +cpu local version suffix to fix uv resolution failure
PEP 440 ==2.8.0 matches 2.8.0+cpu from the extra index, avoiding the
issue where uv cannot locate an explicit +cpu local version specifier.
This aligns with the pattern used by all other CPU backends.
Signed-off-by: eureka928 <meobius123@gmail.com>
* fix(backends): drop +rocm local version suffixes from hipblas requirements to fix uv resolution
uv cannot resolve PEP 440 local version specifiers (e.g. +rocm6.4,
+rocm6.3) in pinned requirements. The --extra-index-url already points
to the correct ROCm wheel index and --index-strategy unsafe-best-match
(set in libbackend.sh) ensures the ROCm variant is preferred.
Applies the same fix as 7f5d72e8 (which resolved this for +cpu) across
all 14 hipblas requirements files.
Signed-off-by: eureka928 <meobius123@gmail.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Signed-off-by: eureka928 <meobius123@gmail.com>
* revert: scope hipblas suffix fix to whisperx only
Reverts changes to non-whisperx hipblas requirements files per
maintainer review — other backends are building fine with the +rocm
local version suffix.
Signed-off-by: eureka928 <meobius123@gmail.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Signed-off-by: eureka928 <meobius123@gmail.com>
---------
Signed-off-by: eureka928 <meobius123@gmail.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-02 15:33:12 +00:00
- !!merge << : *whisperx
name : "cpu-whisperx"
uri : "quay.io/go-skynet/local-ai-backends:latest-cpu-whisperx"
mirrors :
- localai/localai-backends:latest-cpu-whisperx
- !!merge << : *whisperx
name : "cpu-whisperx-development"
uri : "quay.io/go-skynet/local-ai-backends:master-cpu-whisperx"
mirrors :
- localai/localai-backends:master-cpu-whisperx
- !!merge << : *whisperx
name : "cuda12-whisperx"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-12-whisperx"
mirrors :
- localai/localai-backends:latest-gpu-nvidia-cuda-12-whisperx
- !!merge << : *whisperx
name : "cuda12-whisperx-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-12-whisperx"
mirrors :
- localai/localai-backends:master-gpu-nvidia-cuda-12-whisperx
- !!merge << : *whisperx
name : "rocm-whisperx"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-rocm-hipblas-whisperx"
mirrors :
- localai/localai-backends:latest-gpu-rocm-hipblas-whisperx
- !!merge << : *whisperx
name : "rocm-whisperx-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-rocm-hipblas-whisperx"
mirrors :
- localai/localai-backends:master-gpu-rocm-hipblas-whisperx
- !!merge << : *whisperx
name : "cuda13-whisperx"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-13-whisperx"
mirrors :
- localai/localai-backends:latest-gpu-nvidia-cuda-13-whisperx
- !!merge << : *whisperx
name : "cuda13-whisperx-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-13-whisperx"
mirrors :
- localai/localai-backends:master-gpu-nvidia-cuda-13-whisperx
2026-02-03 20:57:50 +00:00
- !!merge << : *whisperx
name : "metal-whisperx"
uri : "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-whisperx"
mirrors :
- localai/localai-backends:latest-metal-darwin-arm64-whisperx
- !!merge << : *whisperx
name : "metal-whisperx-development"
uri : "quay.io/go-skynet/local-ai-backends:master-metal-darwin-arm64-whisperx"
mirrors :
- localai/localai-backends:master-metal-darwin-arm64-whisperx
2026-04-08 19:23:38 +00:00
- !!merge << : *whisperx
name : "nvidia-l4t-arm64-whisperx"
uri : "quay.io/go-skynet/local-ai-backends:latest-nvidia-l4t-whisperx"
mirrors :
- localai/localai-backends:latest-nvidia-l4t-whisperx
- !!merge << : *whisperx
name : "nvidia-l4t-arm64-whisperx-development"
uri : "quay.io/go-skynet/local-ai-backends:master-nvidia-l4t-whisperx"
mirrors :
- localai/localai-backends:master-nvidia-l4t-whisperx
2025-06-17 15:31:53 +00:00
## coqui
2025-06-17 20:21:44 +00:00
2025-07-03 16:01:55 +00:00
- !!merge << : *coqui
name : "coqui-development"
capabilities :
nvidia : "cuda12-coqui-development"
2025-07-28 13:15:19 +00:00
intel : "intel-coqui-development"
2025-07-03 16:01:55 +00:00
amd : "rocm-coqui-development"
2026-02-03 20:57:50 +00:00
metal : "metal-coqui-development"
2025-07-03 16:01:55 +00:00
- !!merge << : *coqui
name : "cuda12-coqui"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-12-coqui"
2025-07-25 17:20:08 +00:00
mirrors :
- localai/localai-backends:latest-gpu-nvidia-cuda-12-coqui
2025-06-17 20:21:44 +00:00
- !!merge << : *coqui
2025-06-18 17:48:50 +00:00
name : "cuda12-coqui-development"
2025-06-15 12:56:52 +00:00
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-12-coqui"
2025-07-25 17:20:08 +00:00
mirrors :
- localai/localai-backends:master-gpu-nvidia-cuda-12-coqui
2025-06-17 20:21:44 +00:00
- !!merge << : *coqui
2025-06-18 17:48:50 +00:00
name : "rocm-coqui-development"
2025-06-15 12:56:52 +00:00
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-rocm-hipblas-coqui"
2025-07-25 17:20:08 +00:00
mirrors :
- localai/localai-backends:master-gpu-rocm-hipblas-coqui
2025-06-17 20:21:44 +00:00
- !!merge << : *coqui
2025-07-28 13:15:19 +00:00
name : "intel-coqui"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-intel-coqui"
2025-07-25 17:20:08 +00:00
mirrors :
2025-07-28 13:15:19 +00:00
- localai/localai-backends:latest-gpu-intel-coqui
2025-06-17 20:21:44 +00:00
- !!merge << : *coqui
2025-07-28 13:15:19 +00:00
name : "intel-coqui-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-intel-coqui"
2025-07-25 17:20:08 +00:00
mirrors :
2025-07-28 13:15:19 +00:00
- localai/localai-backends:master-gpu-intel-coqui
2025-07-03 16:01:55 +00:00
- !!merge << : *coqui
name : "rocm-coqui"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-rocm-hipblas-coqui"
2025-07-25 17:20:08 +00:00
mirrors :
- localai/localai-backends:latest-gpu-rocm-hipblas-coqui
2026-02-03 20:57:50 +00:00
- !!merge << : *coqui
name : "metal-coqui"
uri : "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-coqui"
mirrors :
- localai/localai-backends:latest-metal-darwin-arm64-coqui
- !!merge << : *coqui
name : "metal-coqui-development"
uri : "quay.io/go-skynet/local-ai-backends:master-metal-darwin-arm64-coqui"
mirrors :
- localai/localai-backends:master-metal-darwin-arm64-coqui
## outetts
- !!merge << : *outetts
name : "outetts-development"
capabilities :
default : "cpu-outetts-development"
nvidia-cuda-12 : "cuda12-outetts-development"
- !!merge << : *outetts
name : "cpu-outetts"
uri : "quay.io/go-skynet/local-ai-backends:latest-cpu-outetts"
mirrors :
- localai/localai-backends:latest-cpu-outetts
- !!merge << : *outetts
name : "cpu-outetts-development"
uri : "quay.io/go-skynet/local-ai-backends:master-cpu-outetts"
mirrors :
- localai/localai-backends:master-cpu-outetts
- !!merge << : *outetts
name : "cuda12-outetts"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-12-outetts"
mirrors :
- localai/localai-backends:latest-gpu-nvidia-cuda-12-outetts
- !!merge << : *outetts
name : "cuda12-outetts-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-12-outetts"
mirrors :
- localai/localai-backends:master-gpu-nvidia-cuda-12-outetts
2025-06-17 15:31:53 +00:00
## chatterbox
2025-07-03 16:01:55 +00:00
- !!merge << : *chatterbox
name : "chatterbox-development"
capabilities :
nvidia : "cuda12-chatterbox-development"
2025-09-09 15:58:07 +00:00
metal : "metal-chatterbox-development"
default : "cpu-chatterbox-development"
2025-09-24 16:37:37 +00:00
nvidia-l4t : "nvidia-l4t-arm64-chatterbox"
2025-12-02 13:24:35 +00:00
nvidia-cuda-13 : "cuda13-chatterbox-development"
nvidia-cuda-12 : "cuda12-chatterbox-development"
nvidia-l4t-cuda-12 : "nvidia-l4t-arm64-chatterbox"
2026-02-25 20:51:44 +00:00
nvidia-l4t-cuda-13 : "cuda13-nvidia-l4t-arm64-chatterbox-development"
2025-09-09 15:58:07 +00:00
- !!merge << : *chatterbox
name : "cpu-chatterbox"
uri : "quay.io/go-skynet/local-ai-backends:latest-cpu-chatterbox"
mirrors :
- localai/localai-backends:latest-cpu-chatterbox
- !!merge << : *chatterbox
name : "cpu-chatterbox-development"
uri : "quay.io/go-skynet/local-ai-backends:master-cpu-chatterbox"
mirrors :
- localai/localai-backends:master-cpu-chatterbox
2025-09-24 16:37:37 +00:00
- !!merge << : *chatterbox
name : "nvidia-l4t-arm64-chatterbox"
2025-12-02 14:15:41 +00:00
uri : "quay.io/go-skynet/local-ai-backends:latest-nvidia-l4t-arm64-chatterbox"
2025-09-24 16:37:37 +00:00
mirrors :
2025-12-02 14:15:41 +00:00
- localai/localai-backends:latest-nvidia-l4t-arm64-chatterbox
2025-09-24 16:37:37 +00:00
- !!merge << : *chatterbox
name : "nvidia-l4t-arm64-chatterbox-development"
2025-12-02 14:15:41 +00:00
uri : "quay.io/go-skynet/local-ai-backends:master-nvidia-l4t-arm64-chatterbox"
2025-09-24 16:37:37 +00:00
mirrors :
2025-12-02 14:15:41 +00:00
- localai/localai-backends:master-nvidia-l4t-arm64-chatterbox
2025-09-09 15:58:07 +00:00
- !!merge << : *chatterbox
name : "metal-chatterbox"
uri : "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-chatterbox"
mirrors :
- localai/localai-backends:latest-metal-darwin-arm64-chatterbox
- !!merge << : *chatterbox
name : "metal-chatterbox-development"
uri : "quay.io/go-skynet/local-ai-backends:master-metal-darwin-arm64-chatterbox"
mirrors :
- localai/localai-backends:master-metal-darwin-arm64-chatterbox
2025-06-17 20:21:44 +00:00
- !!merge << : *chatterbox
2025-06-18 17:48:50 +00:00
name : "cuda12-chatterbox-development"
2025-06-15 12:56:52 +00:00
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-12-chatterbox"
2025-07-25 17:20:08 +00:00
mirrors :
- localai/localai-backends:master-gpu-nvidia-cuda-12-chatterbox
2025-06-17 20:21:44 +00:00
- !!merge << : *chatterbox
name : "cuda12-chatterbox"
2025-06-15 12:56:52 +00:00
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-12-chatterbox"
2025-07-25 17:20:08 +00:00
mirrors :
- localai/localai-backends:latest-gpu-nvidia-cuda-12-chatterbox
2025-12-02 13:24:35 +00:00
- !!merge << : *chatterbox
name : "cuda13-chatterbox"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-13-chatterbox"
mirrors :
- localai/localai-backends:latest-gpu-nvidia-cuda-13-chatterbox
- !!merge << : *chatterbox
name : "cuda13-chatterbox-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-13-chatterbox"
mirrors :
- localai/localai-backends:master-gpu-nvidia-cuda-13-chatterbox
- !!merge << : *chatterbox
name : "cuda13-nvidia-l4t-arm64-chatterbox"
2025-12-02 14:15:41 +00:00
uri : "quay.io/go-skynet/local-ai-backends:latest-nvidia-l4t-cuda-13-arm64-chatterbox"
2025-12-02 13:24:35 +00:00
mirrors :
2025-12-02 14:15:41 +00:00
- localai/localai-backends:latest-nvidia-l4t-cuda-13-arm64-chatterbox
2025-12-02 13:24:35 +00:00
- !!merge << : *chatterbox
name : "cuda13-nvidia-l4t-arm64-chatterbox-development"
2025-12-02 14:15:41 +00:00
uri : "quay.io/go-skynet/local-ai-backends:master-nvidia-l4t-cuda-13-arm64-chatterbox"
2025-12-02 13:24:35 +00:00
mirrors :
2025-12-02 14:15:41 +00:00
- localai/localai-backends:master-nvidia-l4t-cuda-13-arm64-chatterbox
2025-12-10 20:14:21 +00:00
## vibevoice
- !!merge << : *vibevoice
name : "vibevoice-development"
capabilities :
nvidia : "cuda12-vibevoice-development"
intel : "intel-vibevoice-development"
amd : "rocm-vibevoice-development"
nvidia-l4t : "nvidia-l4t-vibevoice-development"
2026-02-03 20:57:50 +00:00
metal : "metal-vibevoice-development"
2025-12-10 20:14:21 +00:00
default : "cpu-vibevoice-development"
nvidia-cuda-13 : "cuda13-vibevoice-development"
nvidia-cuda-12 : "cuda12-vibevoice-development"
nvidia-l4t-cuda-12 : "nvidia-l4t-vibevoice-development"
nvidia-l4t-cuda-13 : "cuda13-nvidia-l4t-arm64-vibevoice-development"
- !!merge << : *vibevoice
name : "cpu-vibevoice"
uri : "quay.io/go-skynet/local-ai-backends:latest-cpu-vibevoice"
mirrors :
- localai/localai-backends:latest-cpu-vibevoice
- !!merge << : *vibevoice
name : "cpu-vibevoice-development"
uri : "quay.io/go-skynet/local-ai-backends:master-cpu-vibevoice"
mirrors :
- localai/localai-backends:master-cpu-vibevoice
- !!merge << : *vibevoice
name : "cuda12-vibevoice"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-12-vibevoice"
mirrors :
- localai/localai-backends:latest-gpu-nvidia-cuda-12-vibevoice
- !!merge << : *vibevoice
name : "cuda12-vibevoice-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-12-vibevoice"
mirrors :
- localai/localai-backends:master-gpu-nvidia-cuda-12-vibevoice
- !!merge << : *vibevoice
name : "cuda13-vibevoice"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-13-vibevoice"
mirrors :
- localai/localai-backends:latest-gpu-nvidia-cuda-13-vibevoice
- !!merge << : *vibevoice
name : "cuda13-vibevoice-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-13-vibevoice"
mirrors :
- localai/localai-backends:master-gpu-nvidia-cuda-13-vibevoice
- !!merge << : *vibevoice
name : "intel-vibevoice"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-intel-vibevoice"
mirrors :
- localai/localai-backends:latest-gpu-intel-vibevoice
- !!merge << : *vibevoice
name : "intel-vibevoice-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-intel-vibevoice"
mirrors :
- localai/localai-backends:master-gpu-intel-vibevoice
- !!merge << : *vibevoice
name : "rocm-vibevoice"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-rocm-hipblas-vibevoice"
mirrors :
- localai/localai-backends:latest-gpu-rocm-hipblas-vibevoice
- !!merge << : *vibevoice
name : "rocm-vibevoice-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-rocm-hipblas-vibevoice"
mirrors :
- localai/localai-backends:master-gpu-rocm-hipblas-vibevoice
- !!merge << : *vibevoice
name : "nvidia-l4t-vibevoice"
uri : "quay.io/go-skynet/local-ai-backends:latest-nvidia-l4t-vibevoice"
mirrors :
- localai/localai-backends:latest-nvidia-l4t-vibevoice
- !!merge << : *vibevoice
name : "nvidia-l4t-vibevoice-development"
uri : "quay.io/go-skynet/local-ai-backends:master-nvidia-l4t-vibevoice"
mirrors :
- localai/localai-backends:master-nvidia-l4t-vibevoice
- !!merge << : *vibevoice
name : "cuda13-nvidia-l4t-arm64-vibevoice"
uri : "quay.io/go-skynet/local-ai-backends:latest-nvidia-l4t-cuda-13-arm64-vibevoice"
mirrors :
- localai/localai-backends:latest-nvidia-l4t-cuda-13-arm64-vibevoice
- !!merge << : *vibevoice
name : "cuda13-nvidia-l4t-arm64-vibevoice-development"
uri : "quay.io/go-skynet/local-ai-backends:master-nvidia-l4t-cuda-13-arm64-vibevoice"
mirrors :
- localai/localai-backends:master-nvidia-l4t-cuda-13-arm64-vibevoice
2026-02-03 20:57:50 +00:00
- !!merge << : *vibevoice
name : "metal-vibevoice"
uri : "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-vibevoice"
mirrors :
- localai/localai-backends:latest-metal-darwin-arm64-vibevoice
- !!merge << : *vibevoice
name : "metal-vibevoice-development"
uri : "quay.io/go-skynet/local-ai-backends:master-metal-darwin-arm64-vibevoice"
mirrors :
- localai/localai-backends:master-metal-darwin-arm64-vibevoice
2026-01-23 14:18:41 +00:00
## qwen-tts
- !!merge << : *qwen-tts
name : "qwen-tts-development"
capabilities :
nvidia : "cuda12-qwen-tts-development"
intel : "intel-qwen-tts-development"
amd : "rocm-qwen-tts-development"
nvidia-l4t : "nvidia-l4t-qwen-tts-development"
2026-02-03 20:57:50 +00:00
metal : "metal-qwen-tts-development"
2026-01-23 14:18:41 +00:00
default : "cpu-qwen-tts-development"
nvidia-cuda-13 : "cuda13-qwen-tts-development"
nvidia-cuda-12 : "cuda12-qwen-tts-development"
nvidia-l4t-cuda-12 : "nvidia-l4t-qwen-tts-development"
nvidia-l4t-cuda-13 : "cuda13-nvidia-l4t-arm64-qwen-tts-development"
- !!merge << : *qwen-tts
name : "cpu-qwen-tts"
uri : "quay.io/go-skynet/local-ai-backends:latest-cpu-qwen-tts"
mirrors :
- localai/localai-backends:latest-cpu-qwen-tts
- !!merge << : *qwen-tts
name : "cpu-qwen-tts-development"
uri : "quay.io/go-skynet/local-ai-backends:master-cpu-qwen-tts"
mirrors :
- localai/localai-backends:master-cpu-qwen-tts
- !!merge << : *qwen-tts
name : "cuda12-qwen-tts"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-12-qwen-tts"
mirrors :
- localai/localai-backends:latest-gpu-nvidia-cuda-12-qwen-tts
- !!merge << : *qwen-tts
name : "cuda12-qwen-tts-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-12-qwen-tts"
mirrors :
- localai/localai-backends:master-gpu-nvidia-cuda-12-qwen-tts
- !!merge << : *qwen-tts
name : "cuda13-qwen-tts"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-13-qwen-tts"
mirrors :
- localai/localai-backends:latest-gpu-nvidia-cuda-13-qwen-tts
- !!merge << : *qwen-tts
name : "cuda13-qwen-tts-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-13-qwen-tts"
mirrors :
- localai/localai-backends:master-gpu-nvidia-cuda-13-qwen-tts
- !!merge << : *qwen-tts
name : "intel-qwen-tts"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-intel-qwen-tts"
mirrors :
- localai/localai-backends:latest-gpu-intel-qwen-tts
- !!merge << : *qwen-tts
name : "intel-qwen-tts-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-intel-qwen-tts"
mirrors :
- localai/localai-backends:master-gpu-intel-qwen-tts
- !!merge << : *qwen-tts
name : "rocm-qwen-tts"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-rocm-hipblas-qwen-tts"
mirrors :
- localai/localai-backends:latest-gpu-rocm-hipblas-qwen-tts
- !!merge << : *qwen-tts
name : "rocm-qwen-tts-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-rocm-hipblas-qwen-tts"
mirrors :
- localai/localai-backends:master-gpu-rocm-hipblas-qwen-tts
- !!merge << : *qwen-tts
name : "nvidia-l4t-qwen-tts"
uri : "quay.io/go-skynet/local-ai-backends:latest-nvidia-l4t-qwen-tts"
mirrors :
- localai/localai-backends:latest-nvidia-l4t-qwen-tts
- !!merge << : *qwen-tts
name : "nvidia-l4t-qwen-tts-development"
uri : "quay.io/go-skynet/local-ai-backends:master-nvidia-l4t-qwen-tts"
mirrors :
- localai/localai-backends:master-nvidia-l4t-qwen-tts
- !!merge << : *qwen-tts
name : "cuda13-nvidia-l4t-arm64-qwen-tts"
uri : "quay.io/go-skynet/local-ai-backends:latest-nvidia-l4t-cuda-13-arm64-qwen-tts"
mirrors :
- localai/localai-backends:latest-nvidia-l4t-cuda-13-arm64-qwen-tts
- !!merge << : *qwen-tts
name : "cuda13-nvidia-l4t-arm64-qwen-tts-development"
uri : "quay.io/go-skynet/local-ai-backends:master-nvidia-l4t-cuda-13-arm64-qwen-tts"
mirrors :
- localai/localai-backends:master-nvidia-l4t-cuda-13-arm64-qwen-tts
2026-02-03 20:57:50 +00:00
- !!merge << : *qwen-tts
name : "metal-qwen-tts"
uri : "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-qwen-tts"
mirrors :
- localai/localai-backends:latest-metal-darwin-arm64-qwen-tts
- !!merge << : *qwen-tts
name : "metal-qwen-tts-development"
uri : "quay.io/go-skynet/local-ai-backends:master-metal-darwin-arm64-qwen-tts"
mirrors :
- localai/localai-backends:master-metal-darwin-arm64-qwen-tts
2026-03-12 06:48:23 +00:00
## fish-speech
- !!merge << : *fish-speech
name : "fish-speech-development"
capabilities :
nvidia : "cuda12-fish-speech-development"
intel : "intel-fish-speech-development"
amd : "rocm-fish-speech-development"
nvidia-l4t : "nvidia-l4t-fish-speech-development"
metal : "metal-fish-speech-development"
default : "cpu-fish-speech-development"
nvidia-cuda-13 : "cuda13-fish-speech-development"
nvidia-cuda-12 : "cuda12-fish-speech-development"
nvidia-l4t-cuda-12 : "nvidia-l4t-fish-speech-development"
nvidia-l4t-cuda-13 : "cuda13-nvidia-l4t-arm64-fish-speech-development"
- !!merge << : *fish-speech
name : "cpu-fish-speech"
uri : "quay.io/go-skynet/local-ai-backends:latest-cpu-fish-speech"
mirrors :
- localai/localai-backends:latest-cpu-fish-speech
- !!merge << : *fish-speech
name : "cpu-fish-speech-development"
uri : "quay.io/go-skynet/local-ai-backends:master-cpu-fish-speech"
mirrors :
- localai/localai-backends:master-cpu-fish-speech
- !!merge << : *fish-speech
name : "cuda12-fish-speech"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-12-fish-speech"
mirrors :
- localai/localai-backends:latest-gpu-nvidia-cuda-12-fish-speech
- !!merge << : *fish-speech
name : "cuda12-fish-speech-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-12-fish-speech"
mirrors :
- localai/localai-backends:master-gpu-nvidia-cuda-12-fish-speech
- !!merge << : *fish-speech
name : "cuda13-fish-speech"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-13-fish-speech"
mirrors :
- localai/localai-backends:latest-gpu-nvidia-cuda-13-fish-speech
- !!merge << : *fish-speech
name : "cuda13-fish-speech-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-13-fish-speech"
mirrors :
- localai/localai-backends:master-gpu-nvidia-cuda-13-fish-speech
- !!merge << : *fish-speech
name : "intel-fish-speech"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-intel-fish-speech"
mirrors :
- localai/localai-backends:latest-gpu-intel-fish-speech
- !!merge << : *fish-speech
name : "intel-fish-speech-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-intel-fish-speech"
mirrors :
- localai/localai-backends:master-gpu-intel-fish-speech
- !!merge << : *fish-speech
name : "rocm-fish-speech"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-rocm-hipblas-fish-speech"
mirrors :
- localai/localai-backends:latest-gpu-rocm-hipblas-fish-speech
- !!merge << : *fish-speech
name : "rocm-fish-speech-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-rocm-hipblas-fish-speech"
mirrors :
- localai/localai-backends:master-gpu-rocm-hipblas-fish-speech
- !!merge << : *fish-speech
name : "nvidia-l4t-fish-speech"
uri : "quay.io/go-skynet/local-ai-backends:latest-nvidia-l4t-fish-speech"
mirrors :
- localai/localai-backends:latest-nvidia-l4t-fish-speech
- !!merge << : *fish-speech
name : "nvidia-l4t-fish-speech-development"
uri : "quay.io/go-skynet/local-ai-backends:master-nvidia-l4t-fish-speech"
mirrors :
- localai/localai-backends:master-nvidia-l4t-fish-speech
- !!merge << : *fish-speech
name : "cuda13-nvidia-l4t-arm64-fish-speech"
uri : "quay.io/go-skynet/local-ai-backends:latest-nvidia-l4t-cuda-13-arm64-fish-speech"
mirrors :
- localai/localai-backends:latest-nvidia-l4t-cuda-13-arm64-fish-speech
- !!merge << : *fish-speech
name : "cuda13-nvidia-l4t-arm64-fish-speech-development"
uri : "quay.io/go-skynet/local-ai-backends:master-nvidia-l4t-cuda-13-arm64-fish-speech"
mirrors :
- localai/localai-backends:master-nvidia-l4t-cuda-13-arm64-fish-speech
- !!merge << : *fish-speech
name : "metal-fish-speech"
uri : "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-fish-speech"
mirrors :
- localai/localai-backends:latest-metal-darwin-arm64-fish-speech
- !!merge << : *fish-speech
name : "metal-fish-speech-development"
uri : "quay.io/go-skynet/local-ai-backends:master-metal-darwin-arm64-fish-speech"
mirrors :
- localai/localai-backends:master-metal-darwin-arm64-fish-speech
2026-02-27 07:16:51 +00:00
## faster-qwen3-tts
- !!merge << : *faster-qwen3-tts
name : "faster-qwen3-tts-development"
capabilities :
nvidia : "cuda12-faster-qwen3-tts-development"
default : "cuda12-faster-qwen3-tts-development"
nvidia-cuda-13 : "cuda13-faster-qwen3-tts-development"
nvidia-cuda-12 : "cuda12-faster-qwen3-tts-development"
nvidia-l4t : "nvidia-l4t-faster-qwen3-tts-development"
nvidia-l4t-cuda-12 : "nvidia-l4t-faster-qwen3-tts-development"
nvidia-l4t-cuda-13 : "cuda13-nvidia-l4t-arm64-faster-qwen3-tts-development"
- !!merge << : *faster-qwen3-tts
name : "cuda12-faster-qwen3-tts"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-12-faster-qwen3-tts"
mirrors :
- localai/localai-backends:latest-gpu-nvidia-cuda-12-faster-qwen3-tts
- !!merge << : *faster-qwen3-tts
name : "cuda12-faster-qwen3-tts-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-12-faster-qwen3-tts"
mirrors :
- localai/localai-backends:master-gpu-nvidia-cuda-12-faster-qwen3-tts
- !!merge << : *faster-qwen3-tts
name : "cuda13-faster-qwen3-tts"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-13-faster-qwen3-tts"
mirrors :
- localai/localai-backends:latest-gpu-nvidia-cuda-13-faster-qwen3-tts
- !!merge << : *faster-qwen3-tts
name : "cuda13-faster-qwen3-tts-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-13-faster-qwen3-tts"
mirrors :
- localai/localai-backends:master-gpu-nvidia-cuda-13-faster-qwen3-tts
- !!merge << : *faster-qwen3-tts
name : "nvidia-l4t-faster-qwen3-tts"
uri : "quay.io/go-skynet/local-ai-backends:latest-nvidia-l4t-faster-qwen3-tts"
mirrors :
- localai/localai-backends:latest-nvidia-l4t-faster-qwen3-tts
- !!merge << : *faster-qwen3-tts
name : "nvidia-l4t-faster-qwen3-tts-development"
uri : "quay.io/go-skynet/local-ai-backends:master-nvidia-l4t-faster-qwen3-tts"
mirrors :
- localai/localai-backends:master-nvidia-l4t-faster-qwen3-tts
- !!merge << : *faster-qwen3-tts
name : "cuda13-nvidia-l4t-arm64-faster-qwen3-tts"
uri : "quay.io/go-skynet/local-ai-backends:latest-nvidia-l4t-cuda-13-arm64-faster-qwen3-tts"
mirrors :
- localai/localai-backends:latest-nvidia-l4t-cuda-13-arm64-faster-qwen3-tts
- !!merge << : *faster-qwen3-tts
name : "cuda13-nvidia-l4t-arm64-faster-qwen3-tts-development"
uri : "quay.io/go-skynet/local-ai-backends:master-nvidia-l4t-cuda-13-arm64-faster-qwen3-tts"
mirrors :
- localai/localai-backends:master-nvidia-l4t-cuda-13-arm64-faster-qwen3-tts
2026-01-29 20:50:35 +00:00
## qwen-asr
- !!merge << : *qwen-asr
name : "qwen-asr-development"
capabilities :
nvidia : "cuda12-qwen-asr-development"
intel : "intel-qwen-asr-development"
amd : "rocm-qwen-asr-development"
nvidia-l4t : "nvidia-l4t-qwen-asr-development"
2026-02-03 20:57:50 +00:00
metal : "metal-qwen-asr-development"
2026-01-29 20:50:35 +00:00
default : "cpu-qwen-asr-development"
nvidia-cuda-13 : "cuda13-qwen-asr-development"
nvidia-cuda-12 : "cuda12-qwen-asr-development"
nvidia-l4t-cuda-12 : "nvidia-l4t-qwen-asr-development"
nvidia-l4t-cuda-13 : "cuda13-nvidia-l4t-arm64-qwen-asr-development"
- !!merge << : *qwen-asr
name : "cpu-qwen-asr"
uri : "quay.io/go-skynet/local-ai-backends:latest-cpu-qwen-asr"
mirrors :
- localai/localai-backends:latest-cpu-qwen-asr
- !!merge << : *qwen-asr
name : "cpu-qwen-asr-development"
uri : "quay.io/go-skynet/local-ai-backends:master-cpu-qwen-asr"
mirrors :
- localai/localai-backends:master-cpu-qwen-asr
- !!merge << : *qwen-asr
name : "cuda12-qwen-asr"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-12-qwen-asr"
mirrors :
- localai/localai-backends:latest-gpu-nvidia-cuda-12-qwen-asr
- !!merge << : *qwen-asr
name : "cuda12-qwen-asr-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-12-qwen-asr"
mirrors :
- localai/localai-backends:master-gpu-nvidia-cuda-12-qwen-asr
- !!merge << : *qwen-asr
name : "cuda13-qwen-asr"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-13-qwen-asr"
mirrors :
- localai/localai-backends:latest-gpu-nvidia-cuda-13-qwen-asr
- !!merge << : *qwen-asr
name : "cuda13-qwen-asr-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-13-qwen-asr"
mirrors :
- localai/localai-backends:master-gpu-nvidia-cuda-13-qwen-asr
- !!merge << : *qwen-asr
name : "intel-qwen-asr"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-intel-qwen-asr"
mirrors :
- localai/localai-backends:latest-gpu-intel-qwen-asr
- !!merge << : *qwen-asr
name : "intel-qwen-asr-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-intel-qwen-asr"
mirrors :
- localai/localai-backends:master-gpu-intel-qwen-asr
- !!merge << : *qwen-asr
name : "rocm-qwen-asr"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-rocm-hipblas-qwen-asr"
mirrors :
- localai/localai-backends:latest-gpu-rocm-hipblas-qwen-asr
- !!merge << : *qwen-asr
name : "rocm-qwen-asr-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-rocm-hipblas-qwen-asr"
mirrors :
- localai/localai-backends:master-gpu-rocm-hipblas-qwen-asr
- !!merge << : *qwen-asr
name : "nvidia-l4t-qwen-asr"
uri : "quay.io/go-skynet/local-ai-backends:latest-nvidia-l4t-qwen-asr"
mirrors :
- localai/localai-backends:latest-nvidia-l4t-qwen-asr
- !!merge << : *qwen-asr
name : "nvidia-l4t-qwen-asr-development"
uri : "quay.io/go-skynet/local-ai-backends:master-nvidia-l4t-qwen-asr"
mirrors :
- localai/localai-backends:master-nvidia-l4t-qwen-asr
- !!merge << : *qwen-asr
name : "cuda13-nvidia-l4t-arm64-qwen-asr"
uri : "quay.io/go-skynet/local-ai-backends:latest-nvidia-l4t-cuda-13-arm64-qwen-asr"
mirrors :
- localai/localai-backends:latest-nvidia-l4t-cuda-13-arm64-qwen-asr
- !!merge << : *qwen-asr
name : "cuda13-nvidia-l4t-arm64-qwen-asr-development"
uri : "quay.io/go-skynet/local-ai-backends:master-nvidia-l4t-cuda-13-arm64-qwen-asr"
mirrors :
- localai/localai-backends:master-nvidia-l4t-cuda-13-arm64-qwen-asr
2026-02-03 20:57:50 +00:00
- !!merge << : *qwen-asr
name : "metal-qwen-asr"
uri : "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-qwen-asr"
mirrors :
- localai/localai-backends:latest-metal-darwin-arm64-qwen-asr
- !!merge << : *qwen-asr
name : "metal-qwen-asr-development"
uri : "quay.io/go-skynet/local-ai-backends:master-metal-darwin-arm64-qwen-asr"
mirrors :
- localai/localai-backends:master-metal-darwin-arm64-qwen-asr
2026-02-07 07:19:37 +00:00
## nemo
- !!merge << : *nemo
name : "nemo-development"
capabilities :
nvidia : "cuda12-nemo-development"
intel : "intel-nemo-development"
amd : "rocm-nemo-development"
metal : "metal-nemo-development"
default : "cpu-nemo-development"
nvidia-cuda-13 : "cuda13-nemo-development"
nvidia-cuda-12 : "cuda12-nemo-development"
- !!merge << : *nemo
name : "cpu-nemo"
uri : "quay.io/go-skynet/local-ai-backends:latest-cpu-nemo"
mirrors :
- localai/localai-backends:latest-cpu-nemo
- !!merge << : *nemo
name : "cpu-nemo-development"
uri : "quay.io/go-skynet/local-ai-backends:master-cpu-nemo"
mirrors :
- localai/localai-backends:master-cpu-nemo
- !!merge << : *nemo
name : "cuda12-nemo"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-12-nemo"
mirrors :
- localai/localai-backends:latest-gpu-nvidia-cuda-12-nemo
- !!merge << : *nemo
name : "cuda12-nemo-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-12-nemo"
mirrors :
- localai/localai-backends:master-gpu-nvidia-cuda-12-nemo
- !!merge << : *nemo
name : "cuda13-nemo"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-13-nemo"
mirrors :
- localai/localai-backends:latest-gpu-nvidia-cuda-13-nemo
- !!merge << : *nemo
name : "cuda13-nemo-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-13-nemo"
mirrors :
- localai/localai-backends:master-gpu-nvidia-cuda-13-nemo
- !!merge << : *nemo
name : "intel-nemo"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-intel-nemo"
mirrors :
- localai/localai-backends:latest-gpu-intel-nemo
- !!merge << : *nemo
name : "intel-nemo-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-intel-nemo"
mirrors :
- localai/localai-backends:master-gpu-intel-nemo
- !!merge << : *nemo
name : "rocm-nemo"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-rocm-hipblas-nemo"
mirrors :
- localai/localai-backends:latest-gpu-rocm-hipblas-nemo
- !!merge << : *nemo
name : "rocm-nemo-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-rocm-hipblas-nemo"
mirrors :
- localai/localai-backends:master-gpu-rocm-hipblas-nemo
- !!merge << : *nemo
name : "metal-nemo"
uri : "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-nemo"
mirrors :
- localai/localai-backends:latest-metal-darwin-arm64-nemo
- !!merge << : *nemo
name : "metal-nemo-development"
uri : "quay.io/go-skynet/local-ai-backends:master-metal-darwin-arm64-nemo"
mirrors :
- localai/localai-backends:master-metal-darwin-arm64-nemo
2026-01-28 13:44:04 +00:00
## voxcpm
- !!merge << : *voxcpm
name : "voxcpm-development"
capabilities :
nvidia : "cuda12-voxcpm-development"
intel : "intel-voxcpm-development"
amd : "rocm-voxcpm-development"
2026-02-03 20:57:50 +00:00
metal : "metal-voxcpm-development"
2026-01-28 13:44:04 +00:00
default : "cpu-voxcpm-development"
nvidia-cuda-13 : "cuda13-voxcpm-development"
nvidia-cuda-12 : "cuda12-voxcpm-development"
- !!merge << : *voxcpm
name : "cpu-voxcpm"
uri : "quay.io/go-skynet/local-ai-backends:latest-cpu-voxcpm"
mirrors :
- localai/localai-backends:latest-cpu-voxcpm
- !!merge << : *voxcpm
name : "cpu-voxcpm-development"
uri : "quay.io/go-skynet/local-ai-backends:master-cpu-voxcpm"
mirrors :
- localai/localai-backends:master-cpu-voxcpm
- !!merge << : *voxcpm
name : "cuda12-voxcpm"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-12-voxcpm"
mirrors :
- localai/localai-backends:latest-gpu-nvidia-cuda-12-voxcpm
- !!merge << : *voxcpm
name : "cuda12-voxcpm-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-12-voxcpm"
mirrors :
- localai/localai-backends:master-gpu-nvidia-cuda-12-voxcpm
- !!merge << : *voxcpm
name : "cuda13-voxcpm"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-13-voxcpm"
mirrors :
- localai/localai-backends:latest-gpu-nvidia-cuda-13-voxcpm
- !!merge << : *voxcpm
name : "cuda13-voxcpm-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-13-voxcpm"
mirrors :
- localai/localai-backends:master-gpu-nvidia-cuda-13-voxcpm
- !!merge << : *voxcpm
name : "intel-voxcpm"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-intel-voxcpm"
mirrors :
- localai/localai-backends:latest-gpu-intel-voxcpm
- !!merge << : *voxcpm
name : "intel-voxcpm-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-intel-voxcpm"
mirrors :
- localai/localai-backends:master-gpu-intel-voxcpm
- !!merge << : *voxcpm
name : "rocm-voxcpm"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-rocm-hipblas-voxcpm"
mirrors :
- localai/localai-backends:latest-gpu-rocm-hipblas-voxcpm
- !!merge << : *voxcpm
name : "rocm-voxcpm-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-rocm-hipblas-voxcpm"
mirrors :
- localai/localai-backends:master-gpu-rocm-hipblas-voxcpm
2026-02-03 20:57:50 +00:00
- !!merge << : *voxcpm
name : "metal-voxcpm"
uri : "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-voxcpm"
mirrors :
- localai/localai-backends:latest-metal-darwin-arm64-voxcpm
- !!merge << : *voxcpm
name : "metal-voxcpm-development"
uri : "quay.io/go-skynet/local-ai-backends:master-metal-darwin-arm64-voxcpm"
mirrors :
- localai/localai-backends:master-metal-darwin-arm64-voxcpm
2026-01-13 22:35:19 +00:00
## pocket-tts
- !!merge << : *pocket-tts
name : "pocket-tts-development"
capabilities :
nvidia : "cuda12-pocket-tts-development"
intel : "intel-pocket-tts-development"
amd : "rocm-pocket-tts-development"
nvidia-l4t : "nvidia-l4t-pocket-tts-development"
2026-02-03 20:57:50 +00:00
metal : "metal-pocket-tts-development"
2026-01-13 22:35:19 +00:00
default : "cpu-pocket-tts-development"
nvidia-cuda-13 : "cuda13-pocket-tts-development"
nvidia-cuda-12 : "cuda12-pocket-tts-development"
nvidia-l4t-cuda-12 : "nvidia-l4t-pocket-tts-development"
nvidia-l4t-cuda-13 : "cuda13-nvidia-l4t-arm64-pocket-tts-development"
- !!merge << : *pocket-tts
name : "cpu-pocket-tts"
uri : "quay.io/go-skynet/local-ai-backends:latest-cpu-pocket-tts"
mirrors :
- localai/localai-backends:latest-cpu-pocket-tts
- !!merge << : *pocket-tts
name : "cpu-pocket-tts-development"
uri : "quay.io/go-skynet/local-ai-backends:master-cpu-pocket-tts"
mirrors :
- localai/localai-backends:master-cpu-pocket-tts
- !!merge << : *pocket-tts
name : "cuda12-pocket-tts"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-12-pocket-tts"
mirrors :
- localai/localai-backends:latest-gpu-nvidia-cuda-12-pocket-tts
- !!merge << : *pocket-tts
name : "cuda12-pocket-tts-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-12-pocket-tts"
mirrors :
- localai/localai-backends:master-gpu-nvidia-cuda-12-pocket-tts
- !!merge << : *pocket-tts
name : "cuda13-pocket-tts"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-13-pocket-tts"
mirrors :
- localai/localai-backends:latest-gpu-nvidia-cuda-13-pocket-tts
- !!merge << : *pocket-tts
name : "cuda13-pocket-tts-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-13-pocket-tts"
mirrors :
- localai/localai-backends:master-gpu-nvidia-cuda-13-pocket-tts
- !!merge << : *pocket-tts
name : "intel-pocket-tts"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-intel-pocket-tts"
mirrors :
- localai/localai-backends:latest-gpu-intel-pocket-tts
- !!merge << : *pocket-tts
name : "intel-pocket-tts-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-intel-pocket-tts"
mirrors :
- localai/localai-backends:master-gpu-intel-pocket-tts
- !!merge << : *pocket-tts
name : "rocm-pocket-tts"
uri : "quay.io/go-skynet/local-ai-backends:latest-gpu-rocm-hipblas-pocket-tts"
mirrors :
- localai/localai-backends:latest-gpu-rocm-hipblas-pocket-tts
- !!merge << : *pocket-tts
name : "rocm-pocket-tts-development"
uri : "quay.io/go-skynet/local-ai-backends:master-gpu-rocm-hipblas-pocket-tts"
mirrors :
- localai/localai-backends:master-gpu-rocm-hipblas-pocket-tts
- !!merge << : *pocket-tts
name : "nvidia-l4t-pocket-tts"
uri : "quay.io/go-skynet/local-ai-backends:latest-nvidia-l4t-pocket-tts"
mirrors :
- localai/localai-backends:latest-nvidia-l4t-pocket-tts
- !!merge << : *pocket-tts
name : "nvidia-l4t-pocket-tts-development"
uri : "quay.io/go-skynet/local-ai-backends:master-nvidia-l4t-pocket-tts"
mirrors :
- localai/localai-backends:master-nvidia-l4t-pocket-tts
- !!merge << : *pocket-tts
name : "cuda13-nvidia-l4t-arm64-pocket-tts"
uri : "quay.io/go-skynet/local-ai-backends:latest-nvidia-l4t-cuda-13-arm64-pocket-tts"
mirrors :
- localai/localai-backends:latest-nvidia-l4t-cuda-13-arm64-pocket-tts
- !!merge << : *pocket-tts
name : "cuda13-nvidia-l4t-arm64-pocket-tts-development"
uri : "quay.io/go-skynet/local-ai-backends:master-nvidia-l4t-cuda-13-arm64-pocket-tts"
mirrors :
- localai/localai-backends:master-nvidia-l4t-cuda-13-arm64-pocket-tts
2026-02-03 20:57:50 +00:00
- !!merge << : *pocket-tts
name : "metal-pocket-tts"
uri : "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-pocket-tts"
mirrors :
- localai/localai-backends:latest-metal-darwin-arm64-pocket-tts
- !!merge << : *pocket-tts
name : "metal-pocket-tts-development"
uri : "quay.io/go-skynet/local-ai-backends:master-metal-darwin-arm64-pocket-tts"
mirrors :
- localai/localai-backends:master-metal-darwin-arm64-pocket-tts
2026-02-09 08:12:05 +00:00
## voxtral
- !!merge << : *voxtral
name : "cpu-voxtral"
uri : "quay.io/go-skynet/local-ai-backends:latest-cpu-voxtral"
mirrors :
- localai/localai-backends:latest-cpu-voxtral
- !!merge << : *voxtral
name : "cpu-voxtral-development"
uri : "quay.io/go-skynet/local-ai-backends:master-cpu-voxtral"
mirrors :
- localai/localai-backends:master-cpu-voxtral
- !!merge << : *voxtral
name : "metal-voxtral"
uri : "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-voxtral"
mirrors :
- localai/localai-backends:latest-metal-darwin-arm64-voxtral
- !!merge << : *voxtral
name : "metal-voxtral-development"
uri : "quay.io/go-skynet/local-ai-backends:master-metal-darwin-arm64-voxtral"
mirrors :
- localai/localai-backends:master-metal-darwin-arm64-voxtral
2026-03-21 01:08:02 +00:00
- &trl
name : "trl"
alias : "trl"
license : apache-2.0
description : |
HuggingFace TRL fine-tuning backend. Supports SFT, DPO, GRPO, RLOO, Reward, KTO, ORPO training methods.
Works on CPU and GPU.
urls :
- https://github.com/huggingface/trl
tags :
- fine-tuning
- LLM
- CPU
- GPU
- CUDA
capabilities :
default : "cpu-trl"
nvidia : "cuda12-trl"
nvidia-cuda-12 : "cuda12-trl"
nvidia-cuda-13 : "cuda13-trl"
## TRL backend images
- !!merge << : *trl
name : "cpu-trl"
uri : "quay.io/go-skynet/local-ai-backends:latest-cpu-trl"
mirrors :
- localai/localai-backends:latest-cpu-trl
- !!merge << : *trl
name : "cpu-trl-development"
uri : "quay.io/go-skynet/local-ai-backends:master-cpu-trl"
mirrors :
- localai/localai-backends:master-cpu-trl
- !!merge << : *trl
name : "cuda12-trl"
uri : "quay.io/go-skynet/local-ai-backends:latest-cublas-cuda12-trl"
mirrors :
- localai/localai-backends:latest-cublas-cuda12-trl
- !!merge << : *trl
name : "cuda12-trl-development"
uri : "quay.io/go-skynet/local-ai-backends:master-cublas-cuda12-trl"
mirrors :
- localai/localai-backends:master-cublas-cuda12-trl
- !!merge << : *trl
name : "cuda13-trl"
uri : "quay.io/go-skynet/local-ai-backends:latest-cublas-cuda13-trl"
mirrors :
- localai/localai-backends:latest-cublas-cuda13-trl
- !!merge << : *trl
name : "cuda13-trl-development"
uri : "quay.io/go-skynet/local-ai-backends:master-cublas-cuda13-trl"
mirrors :
- localai/localai-backends:master-cublas-cuda13-trl
2026-03-21 23:56:34 +00:00
## llama.cpp quantization backend
- &llama-cpp-quantization
name : "llama-cpp-quantization"
alias : "llama-cpp-quantization"
license : mit
icon : https://user-images.githubusercontent.com/1991296/230134379-7181e485-c521-4d23-a0d6-f7b3b61ba524.png
description : |
Model quantization backend using llama.cpp. Downloads HuggingFace models, converts them to GGUF format,
and quantizes them to various formats (q4_k_m, q5_k_m, q8_0, f16, etc.).
urls :
- https://github.com/ggml-org/llama.cpp
tags :
- quantization
- GGUF
- CPU
capabilities :
default : "cpu-llama-cpp-quantization"
metal : "metal-darwin-arm64-llama-cpp-quantization"
- !!merge << : *llama-cpp-quantization
name : "cpu-llama-cpp-quantization"
uri : "quay.io/go-skynet/local-ai-backends:latest-cpu-llama-cpp-quantization"
mirrors :
- localai/localai-backends:latest-cpu-llama-cpp-quantization
- !!merge << : *llama-cpp-quantization
name : "metal-darwin-arm64-llama-cpp-quantization"
uri : "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-llama-cpp-quantization"
mirrors :
- localai/localai-backends:latest-metal-darwin-arm64-llama-cpp-quantization