LocalAI/docs/content/reference/compatibility-table.md


+++
disableToc = false
title = "Model compatibility table"
weight = 24
url = "/model-compatibility/"
+++

Besides llama based models, LocalAI is compatible also with other architectures. The table below lists all the backends, compatible models families and the associated repository.

{{% notice note %}}

LocalAI will attempt to automatically load models which are not explicitly configured for a specific backend. You can specify the backend to use by configuring a model with a YAML file. See [the advanced section]({{%relref "advanced" %}}) for more details.

 {{% /notice %}}

## Text Generation & Language Models

| Backend | Description | Capability | Embeddings | Streaming | Acceleration |
|---------|-------------|------------|------------|-----------|-------------|
| [llama.cpp](https://github.com/ggerganov/llama.cpp) | LLM inference in C/C++. Supports LLaMA, Mamba, RWKV, Falcon, Starcoder, GPT-2, [and many others](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#description) | GPT, Functions | yes | yes | CPU, CUDA 12/13, ROCm, Intel SYCL, Vulkan, Metal, Jetson L4T |
| [ik_llama.cpp](https://github.com/ikawrakow/ik_llama.cpp) | Hard fork of llama.cpp optimized for CPU/hybrid CPU+GPU with IQK quants, custom quant mixes, and MLA for DeepSeek | GPT | yes | yes | CPU (AVX2+) |
| [vLLM](https://github.com/vllm-project/vllm) | Fast LLM serving with PagedAttention | GPT | no | no | CUDA 12, ROCm, Intel |
| [vLLM Omni](https://github.com/vllm-project/vllm) | Unified multimodal generation (text, image, video, audio) | Multimodal GPT | no | no | CUDA 12, ROCm |
| [transformers](https://github.com/huggingface/transformers) | HuggingFace Transformers framework | GPT, Embeddings, Multimodal | yes | yes* | CPU, CUDA 12/13, ROCm, Intel, Metal |
| [MLX](https://github.com/ml-explore/mlx-lm) | Apple Silicon LLM inference | GPT | no | no | Metal |
| [MLX-VLM](https://github.com/Blaizzy/mlx-vlm) | Vision-Language Models on Apple Silicon | Multimodal GPT | no | no | Metal |
| [MLX Distributed](https://github.com/ml-explore/mlx-lm) | Distributed LLM inference across multiple Apple Silicon Macs | GPT | no | no | Metal |

## Speech-to-Text

| Backend | Description | Acceleration |
|---------|-------------|-------------|
| [whisper.cpp](https://github.com/ggml-org/whisper.cpp) | OpenAI Whisper in C/C++ | CPU, CUDA 12/13, ROCm, Intel SYCL, Vulkan, Metal, Jetson L4T |
| [faster-whisper](https://github.com/SYSTRAN/faster-whisper) | Fast Whisper with CTranslate2 | CUDA 12/13, ROCm, Intel, Metal |
| [WhisperX](https://github.com/m-bain/whisperX) | Word-level timestamps and speaker diarization | CPU, CUDA 12/13, ROCm, Metal |
| [moonshine](https://github.com/moonshine-ai/moonshine) | Ultra-fast transcription for low-end devices | CPU, CUDA 12/13, Metal |
| [voxtral](https://github.com/mudler/voxtral.c) | Voxtral Realtime 4B speech-to-text in C | CPU, Metal |
| [Qwen3-ASR](https://github.com/QwenLM/Qwen3-ASR) | Qwen3 automatic speech recognition | CPU, CUDA 12/13, ROCm, Intel, Metal, Jetson L4T |
| [NeMo](https://github.com/NVIDIA/NeMo) | NVIDIA NeMo ASR toolkit | CPU, CUDA 12/13, ROCm, Intel, Metal |

## Text-to-Speech

| Backend | Description | Acceleration |
|---------|-------------|-------------|
| [piper](https://github.com/rhasspy/piper) | Fast neural TTS | CPU |
| [Coqui TTS](https://github.com/idiap/coqui-ai-TTS) | TTS with 1100+ languages and voice cloning | CPU, CUDA 12/13, ROCm, Intel, Metal |
| [Kokoro](https://huggingface.co/hexgrad/Kokoro-82M) | Lightweight TTS (82M params) | CUDA 12/13, ROCm, Intel, Metal, Jetson L4T |
| [Chatterbox](https://github.com/resemble-ai/chatterbox) | Production-grade TTS with emotion control | CPU, CUDA 12/13, Metal, Jetson L4T |
| [VibeVoice](https://github.com/microsoft/VibeVoice) | Real-time TTS with voice cloning | CPU, CUDA 12/13, ROCm, Intel, Metal, Jetson L4T |
| [Qwen3-TTS](https://github.com/QwenLM/Qwen3-TTS) | TTS with custom voice, voice design, and voice cloning | CPU, CUDA 12/13, ROCm, Intel, Metal, Jetson L4T |
| [fish-speech](https://github.com/fishaudio/fish-speech) | High-quality TTS with voice cloning | CPU, CUDA 12/13, ROCm, Intel, Metal, Jetson L4T |
| [Pocket TTS](https://github.com/kyutai-labs/pocket-tts) | Lightweight CPU-efficient TTS with voice cloning | CPU, CUDA 12/13, ROCm, Intel, Metal, Jetson L4T |
| [OuteTTS](https://github.com/OuteAI/outetts) | TTS with custom speaker voices | CPU, CUDA 12 |
| [faster-qwen3-tts](https://github.com/andimarafioti/faster-qwen3-tts) | Real-time Qwen3-TTS with CUDA graph capture | CUDA 12/13, Jetson L4T |
| [NeuTTS Air](https://github.com/neuphonic/neutts-air) | Instant voice cloning TTS | CPU, CUDA 12, ROCm |
| [VoxCPM](https://github.com/ModelBest/VoxCPM) | Expressive end-to-end TTS | CPU, CUDA 12/13, ROCm, Intel, Metal |
| [Kitten TTS](https://github.com/KittenML/KittenTTS) | Kitten TTS model | CPU, Metal |
| [MLX-Audio](https://github.com/Blaizzy/mlx-audio) | Audio models on Apple Silicon | Metal, CPU, CUDA 12/13, Jetson L4T |

## Music Generation

| Backend | Description | Acceleration |
|---------|-------------|-------------|
| [ACE-Step](https://github.com/ace-step/ACE-Step-1.5) | Music generation from text descriptions, lyrics, or audio | CPU, CUDA 12/13, ROCm, Intel, Metal |
| [acestep.cpp](https://github.com/ace-step/acestep.cpp) | ACE-Step 1.5 C++ backend using GGML | CPU, CUDA 12/13, ROCm, Intel SYCL, Vulkan, Metal, Jetson L4T |

## Image & Video Generation

| Backend | Description | Acceleration |
|---------|-------------|-------------|
| [stable-diffusion.cpp](https://github.com/leejet/stable-diffusion.cpp) | Stable Diffusion, Flux, PhotoMaker in C/C++ | CPU, CUDA 12/13, Intel SYCL, Vulkan, Metal, Jetson L4T |
| [diffusers](https://github.com/huggingface/diffusers) | HuggingFace diffusion models (image and video generation) | CPU, CUDA 12/13, ROCm, Intel, Metal, Jetson L4T |

## Specialized Tasks

| Backend | Description | Acceleration |
|---------|-------------|-------------|
| [RF-DETR](https://github.com/roboflow/rf-detr) | Real-time transformer-based object detection | CPU, CUDA 12/13, Intel, Metal, Jetson L4T |
| [rerankers](https://github.com/AnswerDotAI/rerankers) | Document reranking for RAG | CUDA 12/13, ROCm, Intel, Metal |
| [local-store](https://github.com/mudler/LocalAI) | Local vector database for embeddings | CPU, Metal |
| [Silero VAD](https://github.com/snakers4/silero-vad) | Voice Activity Detection | CPU |
| [TRL](https://github.com/huggingface/trl) | Fine-tuning (SFT, DPO, GRPO, RLOO, KTO, ORPO) | CPU, CUDA 12/13 |
| [llama.cpp quantization](https://github.com/ggml-org/llama.cpp) | HuggingFace → GGUF model conversion and quantization | CPU, Metal |
| [Opus](https://opus-codec.org/) | Audio codec for WebRTC / Realtime API | CPU, Metal |

## Acceleration Support Summary

### GPU Acceleration
- **NVIDIA CUDA**: CUDA 12.0, CUDA 13.0 support across most backends
- **AMD ROCm**: HIP-based acceleration for AMD GPUs
- **Intel oneAPI**: SYCL-based acceleration for Intel GPUs (F16/F32 precision)
- **Vulkan**: Cross-platform GPU acceleration
- **Metal**: Apple Silicon GPU acceleration (M1/M2/M3+)

### Specialized Hardware
- **NVIDIA Jetson (L4T CUDA 12)**: ARM64 support for embedded AI (AGX Orin, Jetson Nano, Jetson Xavier NX, Jetson AGX Xavier)
- **NVIDIA Jetson (L4T CUDA 13)**: ARM64 support for embedded AI (DGX Spark)
- **Apple Silicon**: Native Metal acceleration for Mac M1/M2/M3+
- **Darwin x86**: Intel Mac support

### CPU Optimization
- **AVX/AVX2/AVX512**: Advanced vector extensions for x86
- **Quantization**: 4-bit, 5-bit, 8-bit integer quantization support
- **Mixed Precision**: F16/F32 mixed precision support

Note: any backend name listed above can be used in the `backend` field of the model configuration file (See [the advanced section]({{%relref "advanced" %}})).

- \* Only for CUDA and OpenVINO CPU/XPU acceleration.
docs: Initial import from localai-website (#1312) Signed-off-by: Ettore Di Giacinto <mudler@localai.io> 2023-11-22 17:13:50 +00:00
			`+++`
			`disableToc = false`
docs/examples: enhancements (#1572) * docs: re-order sections * fix references * Add mixtral-instruct, tinyllama-chat, dolphin-2.5-mixtral-8x7b * Fix link * Minor corrections * fix: models is a StringSlice, not a String Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * WIP: switch docs theme * content * Fix GH link * enhancements * enhancements * Fixed how to link Signed-off-by: lunamidori5 <118759930+lunamidori5@users.noreply.github.com> * fixups * logo fix * more fixups * final touches --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Signed-off-by: lunamidori5 <118759930+lunamidori5@users.noreply.github.com> Co-authored-by: lunamidori5 <118759930+lunamidori5@users.noreply.github.com> 2024-01-18 18:41:08 +00:00			`title = "Model compatibility table"`
			`weight = 24`
docs: re-use original permalinks (#1610) Signed-off-by: Ettore Di Giacinto <mudler@localai.io> 2024-01-19 18:23:58 +00:00			`url = "/model-compatibility/"`
docs: Initial import from localai-website (#1312) Signed-off-by: Ettore Di Giacinto <mudler@localai.io> 2023-11-22 17:13:50 +00:00			`+++`

chore(docs): update available backends (#4325) Signed-off-by: Ettore Di Giacinto <mudler@localai.io> 2024-12-05 15:57:56 +00:00			`Besides llama based models, LocalAI is compatible also with other architectures. The table below lists all the backends, compatible models families and the associated repository.`
docs: Initial import from localai-website (#1312) Signed-off-by: Ettore Di Giacinto <mudler@localai.io> 2023-11-22 17:13:50 +00:00
feat: docs revamp (#7313) * docs Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Small enhancements Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Enhancements * Default to zen-dark Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fixups Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> 2025-11-19 21:21:20 +00:00			`{{% notice note %}}`
docs: Initial import from localai-website (#1312) Signed-off-by: Ettore Di Giacinto <mudler@localai.io> 2023-11-22 17:13:50 +00:00
feat: docs revamp (#7313) * docs Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Small enhancements Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Enhancements * Default to zen-dark Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fixups Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> 2025-11-19 21:21:20 +00:00			`LocalAI will attempt to automatically load models which are not explicitly configured for a specific backend. You can specify the backend to use by configuring a model with a YAML file. See [the advanced section]({{%relref "advanced" %}}) for more details.`
docs: Initial import from localai-website (#1312) Signed-off-by: Ettore Di Giacinto <mudler@localai.io> 2023-11-22 17:13:50 +00:00
feat: docs revamp (#7313) * docs Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Small enhancements Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Enhancements * Default to zen-dark Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fixups Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> 2025-11-19 21:21:20 +00:00			`{{% /notice %}}`
docs: Initial import from localai-website (#1312) Signed-off-by: Ettore Di Giacinto <mudler@localai.io> 2023-11-22 17:13:50 +00:00
chore(docs): update list of supported backends (#6134) Signed-off-by: Ettore Di Giacinto <mudler@localai.io> 2025-08-24 18:09:19 +00:00			`## Text Generation & Language Models`

chore(docs): simplify Signed-off-by: Ettore Di Giacinto <mudler@localai.io> 2026-03-22 20:24:44 +00:00			`\| Backend \| Description \| Capability \| Embeddings \| Streaming \| Acceleration \|`
			`\|---------\|-------------\|------------\|------------\|-----------\|-------------\|`
			`\| [llama.cpp](https://github.com/ggerganov/llama.cpp) \| LLM inference in C/C++. Supports LLaMA, Mamba, RWKV, Falcon, Starcoder, GPT-2, [and many others](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#description) \| GPT, Functions \| yes \| yes \| CPU, CUDA 12/13, ROCm, Intel SYCL, Vulkan, Metal, Jetson L4T \|`
feat(backends): add ik-llama-cpp (#9326) * feat(backends): add ik-llama-cpp Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * chore: add grpc e2e suite, hook to CI, update README Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Apply suggestion from @mudler Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com> * Apply suggestion from @mudler Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com> 2026-04-12 11:51:28 +00:00			`\| [ik_llama.cpp](https://github.com/ikawrakow/ik_llama.cpp) \| Hard fork of llama.cpp optimized for CPU/hybrid CPU+GPU with IQK quants, custom quant mixes, and MLA for DeepSeek \| GPT \| yes \| yes \| CPU (AVX2+) \|`
chore(docs): simplify Signed-off-by: Ettore Di Giacinto <mudler@localai.io> 2026-03-22 20:24:44 +00:00			`\| [vLLM](https://github.com/vllm-project/vllm) \| Fast LLM serving with PagedAttention \| GPT \| no \| no \| CUDA 12, ROCm, Intel \|`
			`\| [vLLM Omni](https://github.com/vllm-project/vllm) \| Unified multimodal generation (text, image, video, audio) \| Multimodal GPT \| no \| no \| CUDA 12, ROCm \|`
			`\| [transformers](https://github.com/huggingface/transformers) \| HuggingFace Transformers framework \| GPT, Embeddings, Multimodal \| yes \| yes* \| CPU, CUDA 12/13, ROCm, Intel, Metal \|`
			`\| [MLX](https://github.com/ml-explore/mlx-lm) \| Apple Silicon LLM inference \| GPT \| no \| no \| Metal \|`
			`\| [MLX-VLM](https://github.com/Blaizzy/mlx-vlm) \| Vision-Language Models on Apple Silicon \| Multimodal GPT \| no \| no \| Metal \|`
			`\| [MLX Distributed](https://github.com/ml-explore/mlx-lm) \| Distributed LLM inference across multiple Apple Silicon Macs \| GPT \| no \| no \| Metal \|`

			`## Speech-to-Text`

			`\| Backend \| Description \| Acceleration \|`
			`\|---------\|-------------\|-------------\|`
			`\| [whisper.cpp](https://github.com/ggml-org/whisper.cpp) \| OpenAI Whisper in C/C++ \| CPU, CUDA 12/13, ROCm, Intel SYCL, Vulkan, Metal, Jetson L4T \|`
			`\| [faster-whisper](https://github.com/SYSTRAN/faster-whisper) \| Fast Whisper with CTranslate2 \| CUDA 12/13, ROCm, Intel, Metal \|`
			`\| [WhisperX](https://github.com/m-bain/whisperX) \| Word-level timestamps and speaker diarization \| CPU, CUDA 12/13, ROCm, Metal \|`
			`\| [moonshine](https://github.com/moonshine-ai/moonshine) \| Ultra-fast transcription for low-end devices \| CPU, CUDA 12/13, Metal \|`
			`\| [voxtral](https://github.com/mudler/voxtral.c) \| Voxtral Realtime 4B speech-to-text in C \| CPU, Metal \|`
			`\| [Qwen3-ASR](https://github.com/QwenLM/Qwen3-ASR) \| Qwen3 automatic speech recognition \| CPU, CUDA 12/13, ROCm, Intel, Metal, Jetson L4T \|`
			`\| [NeMo](https://github.com/NVIDIA/NeMo) \| NVIDIA NeMo ASR toolkit \| CPU, CUDA 12/13, ROCm, Intel, Metal \|`

			`## Text-to-Speech`

			`\| Backend \| Description \| Acceleration \|`
			`\|---------\|-------------\|-------------\|`
			`\| [piper](https://github.com/rhasspy/piper) \| Fast neural TTS \| CPU \|`
			`\| [Coqui TTS](https://github.com/idiap/coqui-ai-TTS) \| TTS with 1100+ languages and voice cloning \| CPU, CUDA 12/13, ROCm, Intel, Metal \|`
			`\| [Kokoro](https://huggingface.co/hexgrad/Kokoro-82M) \| Lightweight TTS (82M params) \| CUDA 12/13, ROCm, Intel, Metal, Jetson L4T \|`
			`\| [Chatterbox](https://github.com/resemble-ai/chatterbox) \| Production-grade TTS with emotion control \| CPU, CUDA 12/13, Metal, Jetson L4T \|`
			`\| [VibeVoice](https://github.com/microsoft/VibeVoice) \| Real-time TTS with voice cloning \| CPU, CUDA 12/13, ROCm, Intel, Metal, Jetson L4T \|`
			`\| [Qwen3-TTS](https://github.com/QwenLM/Qwen3-TTS) \| TTS with custom voice, voice design, and voice cloning \| CPU, CUDA 12/13, ROCm, Intel, Metal, Jetson L4T \|`
			`\| [fish-speech](https://github.com/fishaudio/fish-speech) \| High-quality TTS with voice cloning \| CPU, CUDA 12/13, ROCm, Intel, Metal, Jetson L4T \|`
			`\| [Pocket TTS](https://github.com/kyutai-labs/pocket-tts) \| Lightweight CPU-efficient TTS with voice cloning \| CPU, CUDA 12/13, ROCm, Intel, Metal, Jetson L4T \|`
			`\| [OuteTTS](https://github.com/OuteAI/outetts) \| TTS with custom speaker voices \| CPU, CUDA 12 \|`
			`\| [faster-qwen3-tts](https://github.com/andimarafioti/faster-qwen3-tts) \| Real-time Qwen3-TTS with CUDA graph capture \| CUDA 12/13, Jetson L4T \|`
			`\| [NeuTTS Air](https://github.com/neuphonic/neutts-air) \| Instant voice cloning TTS \| CPU, CUDA 12, ROCm \|`
			`\| [VoxCPM](https://github.com/ModelBest/VoxCPM) \| Expressive end-to-end TTS \| CPU, CUDA 12/13, ROCm, Intel, Metal \|`
			`\| [Kitten TTS](https://github.com/KittenML/KittenTTS) \| Kitten TTS model \| CPU, Metal \|`
			`\| [MLX-Audio](https://github.com/Blaizzy/mlx-audio) \| Audio models on Apple Silicon \| Metal, CPU, CUDA 12/13, Jetson L4T \|`

			`## Music Generation`

			`\| Backend \| Description \| Acceleration \|`
			`\|---------\|-------------\|-------------\|`
			`\| [ACE-Step](https://github.com/ace-step/ACE-Step-1.5) \| Music generation from text descriptions, lyrics, or audio \| CPU, CUDA 12/13, ROCm, Intel, Metal \|`
			`\| [acestep.cpp](https://github.com/ace-step/acestep.cpp) \| ACE-Step 1.5 C++ backend using GGML \| CPU, CUDA 12/13, ROCm, Intel SYCL, Vulkan, Metal, Jetson L4T \|`
docs: Initial import from localai-website (#1312) Signed-off-by: Ettore Di Giacinto <mudler@localai.io> 2023-11-22 17:13:50 +00:00
chore(docs): update list of supported backends (#6134) Signed-off-by: Ettore Di Giacinto <mudler@localai.io> 2025-08-24 18:09:19 +00:00			`## Image & Video Generation`

chore(docs): simplify Signed-off-by: Ettore Di Giacinto <mudler@localai.io> 2026-03-22 20:24:44 +00:00			`\| Backend \| Description \| Acceleration \|`
			`\|---------\|-------------\|-------------\|`
			`\| [stable-diffusion.cpp](https://github.com/leejet/stable-diffusion.cpp) \| Stable Diffusion, Flux, PhotoMaker in C/C++ \| CPU, CUDA 12/13, Intel SYCL, Vulkan, Metal, Jetson L4T \|`
			`\| [diffusers](https://github.com/huggingface/diffusers) \| HuggingFace diffusion models (image and video generation) \| CPU, CUDA 12/13, ROCm, Intel, Metal, Jetson L4T \|`
chore(docs): update list of supported backends (#6134) Signed-off-by: Ettore Di Giacinto <mudler@localai.io> 2025-08-24 18:09:19 +00:00
chore(docs): simplify Signed-off-by: Ettore Di Giacinto <mudler@localai.io> 2026-03-22 20:24:44 +00:00			`## Specialized Tasks`
chore(docs): update list of supported backends (#6134) Signed-off-by: Ettore Di Giacinto <mudler@localai.io> 2025-08-24 18:09:19 +00:00
chore(docs): simplify Signed-off-by: Ettore Di Giacinto <mudler@localai.io> 2026-03-22 20:24:44 +00:00			`\| Backend \| Description \| Acceleration \|`
			`\|---------\|-------------\|-------------\|`
			`\| [RF-DETR](https://github.com/roboflow/rf-detr) \| Real-time transformer-based object detection \| CPU, CUDA 12/13, Intel, Metal, Jetson L4T \|`
			`\| [rerankers](https://github.com/AnswerDotAI/rerankers) \| Document reranking for RAG \| CUDA 12/13, ROCm, Intel, Metal \|`
			`\| [local-store](https://github.com/mudler/LocalAI) \| Local vector database for embeddings \| CPU, Metal \|`
			`\| [Silero VAD](https://github.com/snakers4/silero-vad) \| Voice Activity Detection \| CPU \|`
			`\| [TRL](https://github.com/huggingface/trl) \| Fine-tuning (SFT, DPO, GRPO, RLOO, KTO, ORPO) \| CPU, CUDA 12/13 \|`
			`\| [llama.cpp quantization](https://github.com/ggml-org/llama.cpp) \| HuggingFace → GGUF model conversion and quantization \| CPU, Metal \|`
			`\| [Opus](https://opus-codec.org/) \| Audio codec for WebRTC / Realtime API \| CPU, Metal \|`
chore(docs): update list of supported backends (#6134) Signed-off-by: Ettore Di Giacinto <mudler@localai.io> 2025-08-24 18:09:19 +00:00
			`## Acceleration Support Summary`

			`### GPU Acceleration`
chore: Update to Ubuntu24.04 (cont #7423) (#7769) * ci(workflows): bump GitHub Actions images to Ubuntu 24.04 Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com> * ci(workflows): remove CUDA 11.x support from GitHub Actions (incompatible with ubuntu:24.04) Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com> * ci(workflows): bump GitHub Actions CUDA support to 12.9 Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com> * build(docker): bump base image to ubuntu:24.04 and adjust Vulkan SDK/packages Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com> * fix(backend): correct context paths for Python backends in workflows, Makefile and Dockerfile Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com> * chore(make): disable parallel backend builds to avoid race conditions Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com> * chore(make): export CUDA_MAJOR_VERSION and CUDA_MINOR_VERSION for override Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com> * build(backend): update backend Dockerfiles to Ubuntu 24.04 Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com> * chore(backend): add ROCm env vars and default AMDGPU_TARGETS for hipBLAS builds Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com> * chore(chatterbox): bump ROCm PyTorch to 2.9.1+rocm6.4 and update index URL; align hipblas requirements Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com> * chore: add local-ai-launcher to .gitignore Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com> * ci(workflows): fix backends GitHub Actions workflows after rebase Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com> * build(docker): use build-time UBUNTU_VERSION variable Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com> * chore(docker): remove libquadmath0 from requirements-stage base image Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com> * chore(make): add backends/vllm to .NOTPARALLEL to prevent parallel builds Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com> * fix(docker): correct CUDA installation steps in backend Dockerfiles Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com> * chore(backend): update ROCm to 6.4 and align Python hipblas requirements Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com> * ci(workflows): switch GitHub Actions runners to Ubuntu-24.04 for CUDA on arm64 builds Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com> * build(docker): update base image and backend Dockerfiles for Ubuntu 24.04 compatibility on arm64 Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com> * build(backend): increase timeout for uv installs behind slow networks on backend/Dockerfile.python Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com> * ci(workflows): switch GitHub Actions runners to Ubuntu-24.04 for vibevoice backend Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com> * ci(workflows): fix failing GitHub Actions runners Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com> * fix: Allow FROM_SOURCE to be unset, use upstream Intel images etc. Signed-off-by: Richard Palethorpe <io@richiejp.com> * chore(build): rm all traces of CUDA 11 Signed-off-by: Richard Palethorpe <io@richiejp.com> * chore(build): Add Ubuntu codename as an argument Signed-off-by: Richard Palethorpe <io@richiejp.com> --------- Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com> Signed-off-by: Richard Palethorpe <io@richiejp.com> Co-authored-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com> 2026-01-06 14:26:42 +00:00			`- NVIDIA CUDA: CUDA 12.0, CUDA 13.0 support across most backends`
chore(docs): update list of supported backends (#6134) Signed-off-by: Ettore Di Giacinto <mudler@localai.io> 2025-08-24 18:09:19 +00:00			`- AMD ROCm: HIP-based acceleration for AMD GPUs`
			`- Intel oneAPI: SYCL-based acceleration for Intel GPUs (F16/F32 precision)`
			`- Vulkan: Cross-platform GPU acceleration`
			`- Metal: Apple Silicon GPU acceleration (M1/M2/M3+)`

			`### Specialized Hardware`
chore(docs): update docs with cuda 13 instructions and the new vibevoice backend Signed-off-by: Ettore Di Giacinto <mudler@localai.io> 2025-12-25 09:00:07 +00:00			`- NVIDIA Jetson (L4T CUDA 12): ARM64 support for embedded AI (AGX Orin, Jetson Nano, Jetson Xavier NX, Jetson AGX Xavier)`
			`- NVIDIA Jetson (L4T CUDA 13): ARM64 support for embedded AI (DGX Spark)`
chore(docs): update list of supported backends (#6134) Signed-off-by: Ettore Di Giacinto <mudler@localai.io> 2025-08-24 18:09:19 +00:00			`- Apple Silicon: Native Metal acceleration for Mac M1/M2/M3+`
			`- Darwin x86: Intel Mac support`

			`### CPU Optimization`
			`- AVX/AVX2/AVX512: Advanced vector extensions for x86`
			`- Quantization: 4-bit, 5-bit, 8-bit integer quantization support`
			`- Mixed Precision: F16/F32 mixed precision support`

feat: docs revamp (#7313) * docs Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Small enhancements Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Enhancements * Default to zen-dark Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fixups Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> 2025-11-19 21:21:20 +00:00			Note: any backend name listed above can be used in the `backend` field of the model configuration file (See [the advanced section]({{%relref "advanced" %}})).
docs: Initial import from localai-website (#1312) Signed-off-by: Ettore Di Giacinto <mudler@localai.io> 2023-11-22 17:13:50 +00:00
docs: update compatibility-table.md (#4557) Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com> 2025-01-07 20:20:44 +00:00			`- \* Only for CUDA and OpenVINO CPU/XPU acceleration.`