2023-11-22 17:13:50 +00:00
+++
disableToc = false
2024-01-18 18:41:08 +00:00
title = "Model compatibility table"
weight = 24
2024-01-19 18:23:58 +00:00
url = "/model-compatibility/"
2023-11-22 17:13:50 +00:00
+++
2024-12-05 15:57:56 +00:00
Besides llama based models, LocalAI is compatible also with other architectures. The table below lists all the backends, compatible models families and the associated repository.
2023-11-22 17:13:50 +00:00
2025-11-19 21:21:20 +00:00
{{% notice note %}}
2023-11-22 17:13:50 +00:00
2025-11-19 21:21:20 +00:00
LocalAI will attempt to automatically load models which are not explicitly configured for a specific backend. You can specify the backend to use by configuring a model with a YAML file. See [the advanced section ]({{%relref "advanced" %}} ) for more details.
2023-11-22 17:13:50 +00:00
2025-11-19 21:21:20 +00:00
{{% /notice %}}
2023-11-22 17:13:50 +00:00
2025-08-24 18:09:19 +00:00
## Text Generation & Language Models
2023-11-22 17:13:50 +00:00
| Backend and Bindings | Compatible models | Completion/Chat endpoint | Capability | Embeddings support | Token stream support | Acceleration |
|----------------------------------------------------------------------------------|-----------------------|--------------------------|---------------------------|-----------------------------------|----------------------|--------------|
2025-12-25 09:00:07 +00:00
| [llama.cpp ]({{%relref "features/text-generation#llama.cpp" %}} ) | LLama, Mamba, RWKV, Falcon, Starcoder, GPT-2, [and many others ](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#description ) | yes | GPT and Functions | yes | yes | CUDA 11/12/13, ROCm, Intel SYCL, Vulkan, Metal, CPU |
| [vLLM ](https://github.com/vllm-project/vllm ) | Various GPTs and quantization formats | yes | GPT | no | no | CUDA 12/13, ROCm, Intel |
| [transformers ](https://github.com/huggingface/transformers ) | Various GPTs and quantization formats | yes | GPT, embeddings, Audio generation | yes | yes* | CUDA 11/12/13, ROCm, Intel, CPU |
| [exllama2 ](https://github.com/turboderp-org/exllamav2 ) | GPTQ | yes | GPT only | no | no | CUDA 12/13 |
2025-08-24 18:09:19 +00:00
| [MLX ](https://github.com/ml-explore/mlx-lm ) | Various LLMs | yes | GPT | no | no | Metal (Apple Silicon) |
| [MLX-VLM ](https://github.com/Blaizzy/mlx-vlm ) | Vision-Language Models | yes | Multimodal GPT | no | no | Metal (Apple Silicon) |
2023-11-22 17:13:50 +00:00
| [langchain-huggingface ](https://github.com/tmc/langchaingo ) | Any text generators available on HuggingFace through API | yes | GPT | no | no | N/A |
2025-08-24 18:09:19 +00:00
## Audio & Speech Processing
| Backend and Bindings | Compatible models | Completion/Chat endpoint | Capability | Embeddings support | Token stream support | Acceleration |
|----------------------------------------------------------------------------------|-----------------------|--------------------------|---------------------------|-----------------------------------|----------------------|--------------|
2025-12-25 09:00:07 +00:00
| [whisper.cpp ](https://github.com/ggml-org/whisper.cpp ) | whisper | no | Audio transcription | no | no | CUDA 12/13, ROCm, Intel SYCL, Vulkan, CPU |
| [faster-whisper ](https://github.com/SYSTRAN/faster-whisper ) | whisper | no | Audio transcription | no | no | CUDA 12/13, ROCm, Intel, CPU |
2025-08-24 18:09:19 +00:00
| [piper ](https://github.com/rhasspy/piper ) ([binding](https://github.com/mudler/go-piper)) | Any piper onnx model | no | Text to voice | no | no | CPU |
2025-12-25 09:00:07 +00:00
| [bark ](https://github.com/suno-ai/bark ) | bark | no | Audio generation | no | no | CUDA 12/13, ROCm, Intel |
2025-08-24 18:09:19 +00:00
| [bark-cpp ](https://github.com/PABannier/bark.cpp ) | bark | no | Audio-Only | no | no | CUDA, Metal, CPU |
2025-12-25 09:00:07 +00:00
| [coqui ](https://github.com/idiap/coqui-ai-TTS ) | Coqui TTS | no | Audio generation and Voice cloning | no | no | CUDA 12/13, ROCm, Intel, CPU |
| [kokoro ](https://github.com/hexgrad/kokoro ) | Kokoro TTS | no | Text-to-speech | no | no | CUDA 12/13, ROCm, Intel, CPU |
| [chatterbox ](https://github.com/resemble-ai/chatterbox ) | Chatterbox TTS | no | Text-to-speech | no | no | CUDA 11/12/13, CPU |
2025-08-24 18:09:19 +00:00
| [kitten-tts ](https://github.com/KittenML/KittenTTS ) | Kitten TTS | no | Text-to-speech | no | no | CPU |
2024-12-05 15:57:56 +00:00
| [silero-vad ](https://github.com/snakers4/silero-vad ) with [Golang bindings ](https://github.com/streamer45/silero-vad-go ) | Silero VAD | no | Voice Activity Detection | no | no | CPU |
2025-12-25 09:00:07 +00:00
| [neutts ](https://github.com/neuphonic/neuttsair ) | NeuTTSAir | no | Text-to-speech with voice cloning | no | no | CUDA 12/13, ROCm, CPU |
| [vibevoice ](https://github.com/microsoft/VibeVoice ) | VibeVoice-Realtime | no | Real-time text-to-speech with voice cloning | no | no | CUDA 12/13, ROCm, Intel, CPU |
2025-09-08 07:54:01 +00:00
| [mlx-audio ](https://github.com/Blaizzy/mlx-audio ) | MLX | no | Text-tospeech | no | no | Metal (Apple Silicon) |
2023-11-22 17:13:50 +00:00
2025-08-24 18:09:19 +00:00
## Image & Video Generation
| Backend and Bindings | Compatible models | Completion/Chat endpoint | Capability | Embeddings support | Token stream support | Acceleration |
|----------------------------------------------------------------------------------|-----------------------|--------------------------|---------------------------|-----------------------------------|----------------------|--------------|
2025-12-25 09:00:07 +00:00
| [stablediffusion.cpp ](https://github.com/leejet/stable-diffusion.cpp ) | stablediffusion-1, stablediffusion-2, stablediffusion-3, flux, PhotoMaker | no | Image | no | no | CUDA 12/13, Intel SYCL, Vulkan, CPU |
| [diffusers ](https://github.com/huggingface/diffusers ) | SD, various diffusion models,... | no | Image/Video generation | no | no | CUDA 11/12/13, ROCm, Intel, Metal, CPU |
2025-08-24 18:09:19 +00:00
| [transformers-musicgen ](https://github.com/huggingface/transformers ) | MusicGen | no | Audio generation | no | no | CUDA, CPU |
## Specialized AI Tasks
| Backend and Bindings | Compatible models | Completion/Chat endpoint | Capability | Embeddings support | Token stream support | Acceleration |
|----------------------------------------------------------------------------------|-----------------------|--------------------------|---------------------------|-----------------------------------|----------------------|--------------|
2025-12-25 09:00:07 +00:00
| [rfdetr ](https://github.com/roboflow/rf-detr ) | RF-DETR | no | Object Detection | no | no | CUDA 12/13, Intel, CPU |
| [rerankers ](https://github.com/AnswerDotAI/rerankers ) | Reranking API | no | Reranking | no | no | CUDA 11/12/13, ROCm, Intel, CPU |
2025-08-24 18:09:19 +00:00
| [local-store ](https://github.com/mudler/LocalAI ) | Vector database | no | Vector storage | yes | no | CPU |
| [huggingface ](https://huggingface.co/docs/hub/en/api ) | HuggingFace API models | yes | Various AI tasks | yes | yes | API-based |
## Acceleration Support Summary
### GPU Acceleration
2025-12-25 09:00:07 +00:00
- **NVIDIA CUDA**: CUDA 11.7, CUDA 12.0, CUDA 13.0 support across most backends
2025-08-24 18:09:19 +00:00
- **AMD ROCm**: HIP-based acceleration for AMD GPUs
- **Intel oneAPI**: SYCL-based acceleration for Intel GPUs (F16/F32 precision)
- **Vulkan**: Cross-platform GPU acceleration
- **Metal**: Apple Silicon GPU acceleration (M1/M2/M3+)
### Specialized Hardware
2025-12-25 09:00:07 +00:00
- **NVIDIA Jetson (L4T CUDA 12)**: ARM64 support for embedded AI (AGX Orin, Jetson Nano, Jetson Xavier NX, Jetson AGX Xavier)
- **NVIDIA Jetson (L4T CUDA 13)**: ARM64 support for embedded AI (DGX Spark)
2025-08-24 18:09:19 +00:00
- **Apple Silicon**: Native Metal acceleration for Mac M1/M2/M3+
- **Darwin x86**: Intel Mac support
### CPU Optimization
- **AVX/AVX2/AVX512**: Advanced vector extensions for x86
- **Quantization**: 4-bit, 5-bit, 8-bit integer quantization support
- **Mixed Precision**: F16/F32 mixed precision support
2025-11-19 21:21:20 +00:00
Note: any backend name listed above can be used in the `backend` field of the model configuration file (See [the advanced section ]({{%relref "advanced" %}} )).
2023-11-22 17:13:50 +00:00
2025-01-07 20:20:44 +00:00
- \* Only for CUDA and OpenVINO CPU/XPU acceleration.