mirror of
https://github.com/mudler/LocalAI
synced 2026-04-21 13:27:21 +00:00
* Build llama.cpp separately Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * WIP Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * WIP Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * WIP Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Start to try to attach some tests Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Add git and small fixups Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix: correctly autoload external backends Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Try to run AIO tests Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Slightly update the Makefile helps Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Adapt auto-bumper Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Try to run linux test Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Add llama-cpp into build pipelines Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Add default capability (for cpu) Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Drop llama-cpp specific logic from the backend loader Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * drop grpc install in ci for tests Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fixups Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Pass by backends path for tests Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Build protogen at start Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(tests): set backends path consistently Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Correctly configure the backends path Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Try to build for darwin Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * WIP Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Compile for metal on arm64/darwin Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Try to run build off from cross-arch Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Add to the backend index nvidia-l4t and cpu's llama-cpp backends Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Build also darwin-x86 for llama-cpp Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Disable arm64 builds temporary Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Test backend build on PR Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Fixup build backend reusable workflow Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * pass by skip drivers Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Use crane Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Skip drivers Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Fixups Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * x86 darwin Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Add packaging step for llama.cpp Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fixups Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Fix leftover from bark-cpp extraction Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Try to fix hipblas build Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
603 lines
25 KiB
YAML
603 lines
25 KiB
YAML
---
|
|
## metas
|
|
- &llamacpp
|
|
name: "llama-cpp"
|
|
alias: "llama-cpp"
|
|
license: mit
|
|
icon: https://user-images.githubusercontent.com/1991296/230134379-7181e485-c521-4d23-a0d6-f7b3b61ba524.png
|
|
description: |
|
|
LLM inference in C/C++
|
|
urls:
|
|
- https://github.com/ggerganov/llama.cpp
|
|
tags:
|
|
- text-to-text
|
|
- LLM
|
|
- CPU
|
|
- GPU
|
|
- Metal
|
|
- CUDA
|
|
- HIP
|
|
capabilities:
|
|
default: "cpu-llama-cpp"
|
|
nvidia: "cuda12-llama-cpp"
|
|
intel: "intel-sycl-f16-llama-cpp"
|
|
amd: "rocm-llama-cpp"
|
|
metal: "metal-llama-cpp"
|
|
nvidia-l4t: "nvidia-l4t-arm64-llama-cpp"
|
|
darwin-x86: "darwin-x86-llama-cpp"
|
|
- &vllm
|
|
name: "vllm"
|
|
license: apache-2.0
|
|
urls:
|
|
- https://github.com/vllm-project/vllm
|
|
tags:
|
|
- text-to-text
|
|
- multimodal
|
|
- GPTQ
|
|
- AWQ
|
|
- AutoRound
|
|
- INT4
|
|
- INT8
|
|
- FP8
|
|
icon: https://raw.githubusercontent.com/vllm-project/vllm/main/docs/assets/logos/vllm-logo-text-dark.png
|
|
description: |
|
|
vLLM is a fast and easy-to-use library for LLM inference and serving.
|
|
Originally developed in the Sky Computing Lab at UC Berkeley, vLLM has evolved into a community-driven project with contributions from both academia and industry.
|
|
vLLM is fast with:
|
|
State-of-the-art serving throughput
|
|
Efficient management of attention key and value memory with PagedAttention
|
|
Continuous batching of incoming requests
|
|
Fast model execution with CUDA/HIP graph
|
|
Quantizations: GPTQ, AWQ, AutoRound, INT4, INT8, and FP8
|
|
Optimized CUDA kernels, including integration with FlashAttention and FlashInfer
|
|
Speculative decoding
|
|
Chunked prefill
|
|
alias: "vllm"
|
|
capabilities:
|
|
nvidia: "cuda12-vllm"
|
|
amd: "rocm-vllm"
|
|
intel: "intel-sycl-f16-vllm"
|
|
- &rerankers
|
|
name: "rerankers"
|
|
alias: "rerankers"
|
|
capabilities:
|
|
nvidia: "cuda12-rerankers"
|
|
intel: "intel-sycl-f16-rerankers"
|
|
amd: "rocm-rerankers"
|
|
- &transformers
|
|
name: "transformers"
|
|
icon: https://camo.githubusercontent.com/26569a27b8a30a488dd345024b71dbc05da7ff1b2ba97bb6080c9f1ee0f26cc7/68747470733a2f2f68756767696e67666163652e636f2f64617461736574732f68756767696e67666163652f646f63756d656e746174696f6e2d696d616765732f7265736f6c76652f6d61696e2f7472616e73666f726d6572732f7472616e73666f726d6572735f61735f615f6d6f64656c5f646566696e6974696f6e2e706e67
|
|
alias: "transformers"
|
|
license: apache-2.0
|
|
description: |
|
|
Transformers acts as the model-definition framework for state-of-the-art machine learning models in text, computer vision, audio, video, and multimodal model, for both inference and training.
|
|
It centralizes the model definition so that this definition is agreed upon across the ecosystem. transformers is the pivot across frameworks: if a model definition is supported, it will be compatible with the majority of training frameworks (Axolotl, Unsloth, DeepSpeed, FSDP, PyTorch-Lightning, ...), inference engines (vLLM, SGLang, TGI, ...), and adjacent modeling libraries (llama.cpp, mlx, ...) which leverage the model definition from transformers.
|
|
urls:
|
|
- https://github.com/huggingface/transformers
|
|
tags:
|
|
- text-to-text
|
|
- multimodal
|
|
capabilities:
|
|
nvidia: "cuda12-transformers"
|
|
intel: "intel-sycl-f16-transformers"
|
|
amd: "rocm-transformers"
|
|
- &diffusers
|
|
icon: https://raw.githubusercontent.com/huggingface/diffusers/main/docs/source/en/imgs/diffusers_library.jpg
|
|
description: |
|
|
🤗 Diffusers is the go-to library for state-of-the-art pretrained diffusion models for generating images, audio, and even 3D structures of molecules. Whether you're looking for a simple inference solution or training your own diffusion models, 🤗 Diffusers is a modular toolbox that supports both.
|
|
urls:
|
|
- https://github.com/huggingface/diffusers
|
|
tags:
|
|
- image-generation
|
|
- video-generation
|
|
- diffusion-models
|
|
license: apache-2.0
|
|
alias: "diffusers"
|
|
capabilities:
|
|
nvidia: "cuda12-diffusers"
|
|
intel: "intel-sycl-f32-diffusers"
|
|
amd: "rocm-diffusers"
|
|
- &exllama2
|
|
name: "exllama2"
|
|
urls:
|
|
- https://github.com/turboderp-org/exllamav2
|
|
tags:
|
|
- text-to-text
|
|
- LLM
|
|
- EXL2
|
|
license: MIT
|
|
description: |
|
|
ExLlamaV2 is an inference library for running local LLMs on modern consumer GPUs.
|
|
alias: "exllama2"
|
|
capabilities:
|
|
nvidia: "cuda12-exllama2"
|
|
intel: "intel-sycl-f32-exllama2"
|
|
amd: "rocm-exllama2"
|
|
- &faster-whisper
|
|
icon: https://avatars.githubusercontent.com/u/1520500?s=200&v=4
|
|
description: |
|
|
faster-whisper is a reimplementation of OpenAI's Whisper model using CTranslate2, which is a fast inference engine for Transformer models.
|
|
This implementation is up to 4 times faster than openai/whisper for the same accuracy while using less memory. The efficiency can be further improved with 8-bit quantization on both CPU and GPU.
|
|
urls:
|
|
- https://github.com/SYSTRAN/faster-whisper
|
|
tags:
|
|
- speech-to-text
|
|
- Whisper
|
|
license: MIT
|
|
name: "faster-whisper"
|
|
capabilities:
|
|
nvidia: "cuda12-faster-whisper"
|
|
intel: "intel-sycl-f32-faster-whisper"
|
|
amd: "rocm-faster-whisper"
|
|
- &kokoro
|
|
icon: https://avatars.githubusercontent.com/u/166769057?v=4
|
|
description: |
|
|
Kokoro is an open-weight TTS model with 82 million parameters. Despite its lightweight architecture, it delivers comparable quality to larger models while being significantly faster and more cost-efficient. With Apache-licensed weights, Kokoro can be deployed anywhere from production environments to personal projects.
|
|
urls:
|
|
- https://huggingface.co/hexgrad/Kokoro-82M
|
|
- https://github.com/hexgrad/kokoro
|
|
tags:
|
|
- text-to-speech
|
|
- TTS
|
|
- LLM
|
|
license: apache-2.0
|
|
alias: "kokoro"
|
|
name: "kokoro"
|
|
capabilities:
|
|
nvidia: "cuda12-kokoro"
|
|
intel: "intel-sycl-f32-kokoro"
|
|
amd: "rocm-kokoro"
|
|
- &coqui
|
|
urls:
|
|
- https://github.com/idiap/coqui-ai-TTS
|
|
description: |
|
|
🐸 Coqui TTS is a library for advanced Text-to-Speech generation.
|
|
|
|
🚀 Pretrained models in +1100 languages.
|
|
|
|
🛠️ Tools for training new models and fine-tuning existing models in any language.
|
|
|
|
📚 Utilities for dataset analysis and curation.
|
|
tags:
|
|
- text-to-speech
|
|
- TTS
|
|
license: mpl-2.0
|
|
name: "coqui"
|
|
alias: "coqui"
|
|
capabilities:
|
|
nvidia: "cuda12-coqui"
|
|
intel: "intel-sycl-f32-coqui"
|
|
amd: "rocm-coqui"
|
|
icon: https://avatars.githubusercontent.com/u/1338804?s=200&v=4
|
|
- &bark
|
|
urls:
|
|
- https://github.com/suno-ai/bark
|
|
description: |
|
|
Bark is a transformer-based text-to-audio model created by Suno. Bark can generate highly realistic, multilingual speech as well as other audio - including music, background noise and simple sound effects. The model can also produce nonverbal communications like laughing, sighing and crying. To support the research community, we are providing access to pretrained model checkpoints, which are ready for inference and available for commercial use.
|
|
tags:
|
|
- text-to-speech
|
|
- TTS
|
|
license: MIT
|
|
name: "bark"
|
|
alias: "bark"
|
|
capabilities:
|
|
cuda: "cuda12-bark"
|
|
intel: "intel-sycl-f32-bark"
|
|
rocm: "rocm-bark"
|
|
icon: https://avatars.githubusercontent.com/u/99442120?s=200&v=4
|
|
- &barkcpp
|
|
urls:
|
|
- https://github.com/PABannier/bark.cpp
|
|
description: |
|
|
With bark.cpp, our goal is to bring real-time realistic multilingual text-to-speech generation to the community.
|
|
|
|
Plain C/C++ implementation without dependencies
|
|
AVX, AVX2 and AVX512 for x86 architectures
|
|
CPU and GPU compatible backends
|
|
Mixed F16 / F32 precision
|
|
4-bit, 5-bit and 8-bit integer quantization
|
|
Metal and CUDA backends
|
|
|
|
Models supported
|
|
|
|
Bark Small
|
|
Bark Large
|
|
tags:
|
|
- text-to-speech
|
|
- TTS
|
|
license: MIT
|
|
icon: https://github.com/PABannier/bark.cpp/raw/main/assets/banner.png
|
|
name: "bark-cpp"
|
|
uri: "quay.io/go-skynet/local-ai-backends:latest-bark-cpp"
|
|
alias: "bark-cpp"
|
|
- &chatterbox
|
|
urls:
|
|
- https://github.com/resemble-ai/chatterbox
|
|
description: |
|
|
Resemble AI's first production-grade open source TTS model. Licensed under MIT, Chatterbox has been benchmarked against leading closed-source systems like ElevenLabs, and is consistently preferred in side-by-side evaluations.
|
|
Whether you're working on memes, videos, games, or AI agents, Chatterbox brings your content to life. It's also the first open source TTS model to support emotion exaggeration control, a powerful feature that makes your voices stand out.
|
|
tags:
|
|
- text-to-speech
|
|
- TTS
|
|
license: MIT
|
|
icon: https://private-user-images.githubusercontent.com/660224/448166653-bd8c5f03-e91d-4ee5-b680-57355da204d1.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3NTAxOTE0MDAsIm5iZiI6MTc1MDE5MTEwMCwicGF0aCI6Ii82NjAyMjQvNDQ4MTY2NjUzLWJkOGM1ZjAzLWU5MWQtNGVlNS1iNjgwLTU3MzU1ZGEyMDRkMS5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjUwNjE3JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI1MDYxN1QyMDExNDBaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT1hMmI1NGY3OGFiZTlhNGFkNTVlYTY4NTIwMWEzODRiZGE4YzdhNGQ5MGNhNzE3MDYyYTA2NDIxYTkyYzhiODkwJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCJ9.mR9kM9xX0TdzPuSpuspCllHYQiq79dFQ2rtuNvjrl6w
|
|
name: "chatterbox"
|
|
capabilities:
|
|
nvidia: "cuda12-chatterbox"
|
|
## llama-cpp
|
|
- !!merge <<: *llamacpp
|
|
name: "darwin-x86-llama-cpp"
|
|
uri: "quay.io/go-skynet/local-ai-backends:latest-darwin-x86-llama-cpp"
|
|
- !!merge <<: *llamacpp
|
|
name: "darwin-x86-llama-cpp-development"
|
|
uri: "quay.io/go-skynet/local-ai-backends:master-darwin-x86-llama-cpp"
|
|
- !!merge <<: *llamacpp
|
|
name: "nvidia-l4t-arm64-llama-cpp"
|
|
uri: "quay.io/go-skynet/local-ai-backends:latest-nvidia-l4t-arm64-llama-cpp"
|
|
- !!merge <<: *llamacpp
|
|
name: "nvidia-l4t-arm64-llama-cpp-development"
|
|
uri: "quay.io/go-skynet/local-ai-backends:master-nvidia-l4t-arm64-llama-cpp"
|
|
- !!merge <<: *llamacpp
|
|
name: "cpu-llama-cpp"
|
|
uri: "quay.io/go-skynet/local-ai-backends:latest-cpu-llama-cpp"
|
|
- !!merge <<: *llamacpp
|
|
name: "cpu-llama-cpp-development"
|
|
uri: "quay.io/go-skynet/local-ai-backends:master-cpu-llama-cpp"
|
|
- !!merge <<: *llamacpp
|
|
name: "cuda11-llama-cpp"
|
|
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-11-llama-cpp"
|
|
- !!merge <<: *llamacpp
|
|
name: "cuda12-llama-cpp"
|
|
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-12-llama-cpp"
|
|
- !!merge <<: *llamacpp
|
|
name: "rocm-llama-cpp"
|
|
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-rocm-hipblas-llama-cpp"
|
|
- !!merge <<: *llamacpp
|
|
name: "intel-sycl-f32-llama-cpp"
|
|
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-intel-sycl-f32-llama-cpp"
|
|
- !!merge <<: *llamacpp
|
|
name: "intel-sycl-f16-llama-cpp"
|
|
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-intel-sycl-f16-llama-cpp"
|
|
- !!merge <<: *llamacpp
|
|
name: "metal-llama-cpp"
|
|
uri: "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-llama-cpp"
|
|
- !!merge <<: *llamacpp
|
|
name: "metal-llama-cpp-development"
|
|
uri: "quay.io/go-skynet/local-ai-backends:master-metal-darwin-arm64-llama-cpp"
|
|
- !!merge <<: *llamacpp
|
|
name: "cuda11-llama-cpp-development"
|
|
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-11-llama-cpp"
|
|
- !!merge <<: *llamacpp
|
|
name: "cuda12-llama-cpp-development"
|
|
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-12-llama-cpp"
|
|
- !!merge <<: *llamacpp
|
|
name: "rocm-llama-cpp-development"
|
|
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-rocm-hipblas-llama-cpp"
|
|
- !!merge <<: *llamacpp
|
|
name: "intel-sycl-f32-llama-cpp-development"
|
|
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-intel-sycl-f32-llama-cpp"
|
|
- !!merge <<: *llamacpp
|
|
name: "intel-sycl-f16-vllm-development"
|
|
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-intel-sycl-f16-vllm"
|
|
# vllm
|
|
- !!merge <<: *vllm
|
|
name: "vllm-development"
|
|
capabilities:
|
|
nvidia: "cuda12-vllm-development"
|
|
amd: "rocm-vllm-development"
|
|
intel: "intel-sycl-f16-vllm-development"
|
|
- !!merge <<: *vllm
|
|
name: "cuda11-vllm"
|
|
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-11-vllm"
|
|
- !!merge <<: *vllm
|
|
name: "cuda12-vllm"
|
|
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-12-vllm"
|
|
- !!merge <<: *vllm
|
|
name: "rocm-vllm"
|
|
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-rocm-hipblas-vllm"
|
|
- !!merge <<: *vllm
|
|
name: "intel-sycl-f32-vllm"
|
|
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-intel-sycl-f32-vllm"
|
|
- !!merge <<: *vllm
|
|
name: "intel-sycl-f16-vllm"
|
|
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-intel-sycl-f16-vllm"
|
|
- !!merge <<: *vllm
|
|
name: "cuda11-vllm-development"
|
|
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-11-vllm"
|
|
- !!merge <<: *vllm
|
|
name: "cuda12-vllm-development"
|
|
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-12-vllm"
|
|
- !!merge <<: *vllm
|
|
name: "rocm-vllm-development"
|
|
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-rocm-hipblas-vllm"
|
|
- !!merge <<: *vllm
|
|
name: "intel-sycl-f32-vllm-development"
|
|
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-intel-sycl-f32-vllm"
|
|
- !!merge <<: *vllm
|
|
name: "intel-sycl-f16-vllm-development"
|
|
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-intel-sycl-f16-vllm"
|
|
## Rerankers
|
|
- !!merge <<: *rerankers
|
|
name: "rerankers-development"
|
|
capabilities:
|
|
nvidia: "cuda12-rerankers-development"
|
|
intel: "intel-sycl-f16-rerankers-development"
|
|
amd: "rocm-rerankers-development"
|
|
- !!merge <<: *rerankers
|
|
name: "cuda11-rerankers"
|
|
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-11-rerankers"
|
|
- !!merge <<: *rerankers
|
|
name: "cuda12-rerankers"
|
|
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-12-rerankers"
|
|
- !!merge <<: *rerankers
|
|
name: "intel-sycl-f32-rerankers"
|
|
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-intel-sycl-f32-rerankers"
|
|
- !!merge <<: *rerankers
|
|
name: "intel-sycl-f16-rerankers"
|
|
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-intel-sycl-f16-rerankers"
|
|
- !!merge <<: *rerankers
|
|
name: "rocm-rerankers"
|
|
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-rocm-hipblas-rerankers"
|
|
- !!merge <<: *rerankers
|
|
name: "cuda11-rerankers-development"
|
|
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-11-rerankers"
|
|
- !!merge <<: *rerankers
|
|
name: "cuda12-rerankers-development"
|
|
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-12-rerankers"
|
|
- !!merge <<: *rerankers
|
|
name: "rocm-rerankers-development"
|
|
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-rocm-hipblas-rerankers"
|
|
- !!merge <<: *rerankers
|
|
name: "intel-sycl-f32-rerankers-development"
|
|
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-intel-sycl-f32-rerankers"
|
|
- !!merge <<: *rerankers
|
|
name: "intel-sycl-f16-rerankers-development"
|
|
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-intel-sycl-f16-rerankers"
|
|
## Transformers
|
|
- !!merge <<: *transformers
|
|
name: "transformers-development"
|
|
capabilities:
|
|
nvidia: "cuda12-transformers-development"
|
|
intel: "intel-sycl-f16-transformers-development"
|
|
amd: "rocm-transformers-development"
|
|
- !!merge <<: *transformers
|
|
name: "cuda12-transformers"
|
|
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-12-transformers"
|
|
- !!merge <<: *transformers
|
|
name: "rocm-transformers"
|
|
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-rocm-hipblas-transformers"
|
|
- !!merge <<: *transformers
|
|
name: "intel-sycl-f32-transformers"
|
|
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-intel-sycl-f32-transformers"
|
|
- !!merge <<: *transformers
|
|
name: "intel-sycl-f16-transformers"
|
|
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-intel-sycl-f16-transformers"
|
|
- !!merge <<: *transformers
|
|
name: "cuda11-transformers-development"
|
|
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-11-transformers"
|
|
- !!merge <<: *transformers
|
|
name: "cuda11-transformers"
|
|
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-11-transformers"
|
|
- !!merge <<: *transformers
|
|
name: "cuda12-transformers-development"
|
|
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-12-transformers"
|
|
- !!merge <<: *transformers
|
|
name: "rocm-transformers-development"
|
|
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-rocm-hipblas-transformers"
|
|
- !!merge <<: *transformers
|
|
name: "intel-sycl-f32-transformers-development"
|
|
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-intel-sycl-f32-transformers"
|
|
- !!merge <<: *transformers
|
|
name: "intel-sycl-f16-transformers-development"
|
|
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-intel-sycl-f16-transformers"
|
|
## Diffusers
|
|
- !!merge <<: *diffusers
|
|
name: "diffusers-development"
|
|
capabilities:
|
|
nvidia: "cuda12-diffusers-development"
|
|
intel: "intel-sycl-f32-diffusers-development"
|
|
amd: "rocm-diffusers-development"
|
|
- !!merge <<: *diffusers
|
|
name: "cuda12-diffusers"
|
|
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-12-diffusers"
|
|
- !!merge <<: *diffusers
|
|
name: "rocm-diffusers"
|
|
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-rocm-hipblas-diffusers"
|
|
- !!merge <<: *diffusers
|
|
name: "cuda11-diffusers"
|
|
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-11-diffusers"
|
|
- !!merge <<: *diffusers
|
|
name: "intel-sycl-f32-diffusers"
|
|
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-intel-sycl-f32-diffusers"
|
|
- !!merge <<: *diffusers
|
|
name: "cuda11-diffusers-development"
|
|
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-11-diffusers"
|
|
- !!merge <<: *diffusers
|
|
name: "cuda12-diffusers-development"
|
|
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-12-diffusers"
|
|
- !!merge <<: *diffusers
|
|
name: "rocm-diffusers-development"
|
|
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-rocm-hipblas-diffusers"
|
|
- !!merge <<: *diffusers
|
|
name: "intel-sycl-f32-diffusers-development"
|
|
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-intel-sycl-f32-diffusers"
|
|
## exllama2
|
|
- !!merge <<: *exllama2
|
|
name: "exllama2-development"
|
|
capabilities:
|
|
nvidia: "cuda12-exllama2-development"
|
|
intel: "intel-sycl-f32-exllama2-development"
|
|
amd: "rocm-exllama2-development"
|
|
- !!merge <<: *exllama2
|
|
name: "cuda11-exllama2"
|
|
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-11-exllama2"
|
|
- !!merge <<: *exllama2
|
|
name: "cuda12-exllama2"
|
|
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-12-exllama2"
|
|
- !!merge <<: *exllama2
|
|
name: "cuda11-exllama2-development"
|
|
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-11-exllama2"
|
|
- !!merge <<: *exllama2
|
|
name: "cuda12-exllama2-development"
|
|
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-12-exllama2"
|
|
## kokoro
|
|
- !!merge <<: *kokoro
|
|
name: "kokoro-development"
|
|
capabilities:
|
|
nvidia: "cuda12-kokoro-development"
|
|
intel: "intel-sycl-f32-kokoro-development"
|
|
amd: "rocm-kokoro-development"
|
|
- !!merge <<: *kokoro
|
|
name: "cuda11-kokoro-development"
|
|
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-11-kokoro"
|
|
- !!merge <<: *kokoro
|
|
name: "cuda12-kokoro-development"
|
|
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-12-kokoro"
|
|
- !!merge <<: *kokoro
|
|
name: "rocm-kokoro-development"
|
|
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-rocm-hipblas-kokoro"
|
|
- !!merge <<: *kokoro
|
|
name: "sycl-f32-kokoro"
|
|
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-intel-sycl-f32-kokoro"
|
|
- !!merge <<: *kokoro
|
|
name: "sycl-f16-kokoro"
|
|
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-intel-sycl-f16-kokoro"
|
|
- !!merge <<: *kokoro
|
|
name: "sycl-f16-kokoro-development"
|
|
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-intel-sycl-f16-kokoro"
|
|
- !!merge <<: *kokoro
|
|
name: "sycl-f32-kokoro-development"
|
|
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-intel-sycl-f32-kokoro"
|
|
- !!merge <<: *kokoro
|
|
name: "cuda11-kokoro"
|
|
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-11-kokoro"
|
|
- !!merge <<: *kokoro
|
|
name: "cuda12-kokoro"
|
|
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-12-kokoro"
|
|
- !!merge <<: *kokoro
|
|
name: "rocm-kokoro"
|
|
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-rocm-hipblas-kokoro"
|
|
## faster-whisper
|
|
- !!merge <<: *faster-whisper
|
|
name: "faster-whisper-development"
|
|
capabilities:
|
|
nvidia: "cuda12-faster-whisper-development"
|
|
intel: "intel-sycl-f32-faster-whisper-development"
|
|
amd: "rocm-faster-whisper-development"
|
|
- !!merge <<: *faster-whisper
|
|
name: "cuda11-faster-whisper"
|
|
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-11-faster-whisper"
|
|
- !!merge <<: *faster-whisper
|
|
name: "cuda12-faster-whisper-development"
|
|
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-12-faster-whisper"
|
|
- !!merge <<: *faster-whisper
|
|
name: "rocm-faster-whisper-development"
|
|
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-rocm-hipblas-faster-whisper"
|
|
- !!merge <<: *faster-whisper
|
|
name: "sycl-f32-faster-whisper"
|
|
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-intel-sycl-f32-faster-whisper"
|
|
- !!merge <<: *faster-whisper
|
|
name: "sycl-f16-faster-whisper"
|
|
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-intel-sycl-f16-faster-whisper"
|
|
- !!merge <<: *faster-whisper
|
|
name: "sycl-f32-faster-whisper-development"
|
|
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-intel-sycl-f32-faster-whisper"
|
|
- !!merge <<: *faster-whisper
|
|
name: "sycl-f16-faster-whisper-development"
|
|
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-intel-sycl-f16-faster-whisper"
|
|
## coqui
|
|
|
|
- !!merge <<: *coqui
|
|
name: "coqui-development"
|
|
capabilities:
|
|
nvidia: "cuda12-coqui-development"
|
|
intel: "intel-sycl-f32-coqui-development"
|
|
amd: "rocm-coqui-development"
|
|
- !!merge <<: *coqui
|
|
name: "cuda11-coqui"
|
|
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-11-coqui"
|
|
- !!merge <<: *coqui
|
|
name: "cuda12-coqui"
|
|
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-12-coqui"
|
|
- !!merge <<: *coqui
|
|
name: "cuda11-coqui-development"
|
|
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-11-coqui"
|
|
- !!merge <<: *coqui
|
|
name: "cuda12-coqui-development"
|
|
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-12-coqui"
|
|
- !!merge <<: *coqui
|
|
name: "rocm-coqui-development"
|
|
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-rocm-hipblas-coqui"
|
|
- !!merge <<: *coqui
|
|
name: "sycl-f32-coqui"
|
|
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-intel-sycl-f32-coqui"
|
|
- !!merge <<: *coqui
|
|
name: "sycl-f16-coqui"
|
|
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-intel-sycl-f16-coqui"
|
|
- !!merge <<: *coqui
|
|
name: "sycl-f32-coqui-development"
|
|
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-intel-sycl-f32-coqui"
|
|
- !!merge <<: *coqui
|
|
name: "sycl-f16-coqui-development"
|
|
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-intel-sycl-f16-coqui"
|
|
- !!merge <<: *coqui
|
|
name: "rocm-coqui"
|
|
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-rocm-hipblas-coqui"
|
|
## bark
|
|
- !!merge <<: *bark
|
|
name: "bark-development"
|
|
capabilities:
|
|
nvidia: "cuda12-bark-development"
|
|
intel: "intel-sycl-f32-bark-development"
|
|
amd: "rocm-bark-development"
|
|
- !!merge <<: *bark
|
|
name: "cuda11-bark-development"
|
|
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-11-bark"
|
|
- !!merge <<: *bark
|
|
name: "cuda11-bark"
|
|
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-11-bark"
|
|
- !!merge <<: *bark
|
|
name: "rocm-bark-development"
|
|
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-rocm-hipblas-bark"
|
|
- !!merge <<: *bark
|
|
name: "sycl-f32-bark"
|
|
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-intel-sycl-f32-bark"
|
|
- !!merge <<: *bark
|
|
name: "sycl-f16-bark"
|
|
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-intel-sycl-f16-bark"
|
|
- !!merge <<: *bark
|
|
name: "sycl-f32-bark-development"
|
|
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-intel-sycl-f32-bark"
|
|
- !!merge <<: *bark
|
|
name: "sycl-f16-bark-development"
|
|
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-intel-sycl-f16-bark"
|
|
- !!merge <<: *bark
|
|
name: "cuda12-bark"
|
|
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-12-bark"
|
|
- !!merge <<: *bark
|
|
name: "rocm-bark"
|
|
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-rocm-hipblas-bark"
|
|
- !!merge <<: *bark
|
|
name: "cuda12-bark-development"
|
|
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-12-bark"
|
|
- !!merge <<: *barkcpp
|
|
name: "bark-cpp-development"
|
|
uri: "quay.io/go-skynet/local-ai-backends:master-bark-cpp"
|
|
alias: "bark-cpp"
|
|
## chatterbox
|
|
- !!merge <<: *chatterbox
|
|
name: "chatterbox-development"
|
|
capabilities:
|
|
nvidia: "cuda12-chatterbox-development"
|
|
- !!merge <<: *chatterbox
|
|
name: "cuda12-chatterbox-development"
|
|
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-12-chatterbox"
|
|
- !!merge <<: *chatterbox
|
|
name: "cuda11-chatterbox"
|
|
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-11-chatterbox"
|
|
- !!merge <<: *chatterbox
|
|
name: "cuda11-chatterbox-development"
|
|
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-11-chatterbox"
|
|
- !!merge <<: *chatterbox
|
|
name: "cuda12-chatterbox"
|
|
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-12-chatterbox"
|