LocalAI

mirror of https://github.com/mudler/LocalAI synced 2026-05-24 09:28:23 +00:00

LocalAI is the open-source AI engine. Run any model - LLMs, vision, voice, image, video - on any hardware. No GPU required.

Find a file

LocalAI [bot] 5cda4f1ccf fix(L4T13 backends): switch vllm/sglang/vllm-omni to PyPI aarch64+cu130 wheels (#9950 ) * fix(vllm): switch L4T13 backend to PyPI aarch64+cu130 wheels The L4T13 vllm backend pulled torch / torchvision / torchaudio / vllm from pypi.jetson-ai-lab.io's sbsa/cu130 mirror via [tool.uv.sources] with no version pins. That mirror started shipping torch 2.11.0 next to a vllm-0.20.0+cu130 wheel that was still compiled against torch 2.10's c10 ABI, so uv landed on the mismatched pair and vllm crashed at import: ImportError: vllm/_C.abi3.so: undefined symbol: _ZN3c1013MessageLoggerC1EPKciib (c10::MessageLogger's constructor signature changed between torch 2.10 and 2.11; the vllm wheel referenced the 2.10 form, the installed libc10.so exported only the 2.11 form.) Since torch 2.11 (April 2026) PyPI publishes its own aarch64 + cu130 manylinux wheels, and vllm 0.20.0 ships an aarch64 wheel whose Requires- Dist locks torch==2.11.0 / torchvision==0.26.0 / torchaudio==2.11.0. That makes uv's resolver produce an ABI-consistent set automatically, so the mirror and the [tool.uv.sources] pinning are no longer needed. flash-attn is dropped from the dep list: PyPI has no aarch64 wheel, but vLLM 0.20+ already bundles its own vllm_flash_attn (fa2 + fa3) inside the main wheel, so the Dao-AILab package isn't required at runtime. Reference: https://pytorch.org/blog/vllm-and-pytorch-work-together-to-improve-the-developer-experience-on-aarch64/ Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Write] [Bash] [WebFetch] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactor(vllm): retire l4t13 pyproject.toml in favor of requirements-.txt pyproject.toml only existed because uv pip install -r requirements.txt doesn't honor [tool.uv.sources]. The previous commit dropped [tool.uv. sources] (PyPI now serves the aarch64 + cu130 wheels directly), so the file no longer carries any logic the requirements-.txt path can't. Replace with the same two-file pattern every other build profile uses: - requirements-l4t13.txt (accelerate / torch / transformers / bitsandbytes - matches cublas13's split) - requirements-l4t13-after.txt (vllm; runs after the base resolve so the cu130 torch wheel lands first) install.sh's whole l4t13 elif branch goes away; libbackend.sh's installRequirements already handles the requirements-install.txt build- deps pass, the C_INCLUDE_PATH export for PORTABLE_PYTHON, and the runProtogen call, so falling through to the standard else: branch produces identical install behavior with less surface area. No functional change at install time - same wheels, same order. Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Write] [Bash] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(sglang,vllm-omni): switch L4T13 backends to PyPI aarch64+cu130 wheels Same root cause and same fix as the vllm backend in the previous commits: the L4T13 sglang and vllm-omni backends both pulled their accelerator stack from pypi.jetson-ai-lab.io's sbsa/cu130 mirror with no version pins, so they would silently land on the same torch 2.11 vs cu130-built wheel ABI mismatch the moment the mirror published an out-of-sync pair. sglang ------ - Drop pyproject.toml + [tool.uv.sources]. The historical comment said the [all] extra was unsafe on aarch64 because of decord, but sglang 0.5.x now uses `decord2` on aarch64/arm/armv7l (which ships cp312 aarch64 wheels), so we can match cublas13's sglang[all]>=0.5.11 pin and stop being capped at the 0.5.1.post2 the L4T mirror shipped. That unblocks Gemma 4 / MTP recipes on Jetson Thor. - New requirements-l4t13.txt mirrors the cublas13 split (accelerate / torch / torchvision / torchaudio / transformers), requirements-l4t13- after.txt carries sglang[all]>=0.5.11. - install.sh's l4t13 elif branch goes away; falls through to the standard installRequirements path. vllm-omni --------- - requirements-l4t13.txt drops --extra-index-url to jetson-ai-lab and drops flash-attn (PyPI has no aarch64 wheel, vLLM 0.20+ bundles its own vllm_flash_attn fa2 + fa3 internally). - install.sh's l4t13 vllm-install branch collapses into the cublas13 branch since both now just run `pip install vllm --torch-backend=auto` against PyPI. - --index-strategy=unsafe-best-match is dropped from the top-level l4t13 guard; without the L4T mirror in the picture it had no purpose. The from-source vllm-omni install on top still keeps its existing `sed -i '/^fa3-fwd[[:space:]]==/d' requirements/cuda.txt` workaround - fa3-fwd has no aarch64 wheel and no sdist, unrelated to flash-attn. Reference: https://pytorch.org/blog/vllm-and-pytorch-work-together-to-improve-the-developer-experience-on-aarch64/ Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Write] [Bash] [WebFetch] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> fix(sglang): drop [all] extra on l4t13 - xatlas has no aarch64 wheel CI revealed that sglang[all]==0.5.12 transitively pulls xatlas via the [diffusion] sub-extra, and xatlas ships no aarch64 wheel. Its sdist depends on scikit_build_core without declaring it in build-system. requires, so under --no-build-isolation uv can't build it from source: × Failed to build `xatlas==0.0.11` ├─▶ The build backend returned an error ╰─▶ Call to `scikit_build_core.build.build_wheel` failed (exit status: 1) ModuleNotFoundError: No module named 'scikit_build_core' help: `xatlas` (v0.0.11) was included because `sglang[all]` (v0.5.12) depends on `xatlas` Upstream sglang explicitly gates st_attn and vsa on `platform_machine != aarch64` inside the same [diffusion] extra but forgot xatlas - same class of bug that bit the old decord pin. Use plain `sglang>=0.5.11` on l4t13. backend.py imports only base sglang.srt symbols (Engine, ServerArgs, FunctionCallParser, ReasoningParser); the [all] extras are optional accelerators not required at import time. cublas13 (x86_64) keeps [all] because xatlas has x86_64 wheels there. Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Write] [Bash] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>		2026-05-22 23:01:22 +02:00
.agents	feat(gallery): verify backend OCI images with keyless cosign (#9823 )	2026-05-18 08:02:20 +02:00
.devcontainer	fix: Add named volumes for Windows Docker compatibility (#8661 )	2026-02-26 23:18:53 +01:00
.devcontainer-scripts	feat: refactor build process, drop embedded backends (#5875 )	2025-07-22 16:31:04 +02:00
.docker	ci: refactor llama-cpp variant Dockerfiles to consume prebuilt base-grpc images (PR 2/2) (#9738 )	2026-05-10 00:03:52 +02:00
.github	ci(images): publish chronologically-orderable master-<epoch>-<sha> tags	2026-05-21 17:18:30 +00:00
.vscode	feat: refactor build process, drop embedded backends (#5875 )	2025-07-22 16:31:04 +02:00
backend	fix(L4T13 backends): switch vllm/sglang/vllm-omni to PyPI aarch64+cu130 wheels (#9950 )	2026-05-22 23:01:22 +02:00
cmd	feat: Merge repeated log lines in the terminal (#9141 )	2026-03-26 22:16:13 +01:00
configuration	refactor: move remaining api packages to core (#1731 )	2024-03-01 16:19:53 +01:00
core	feat(config): default prompt_cache_all to true (#9951 )	2026-05-22 22:06:22 +02:00
custom-ca-certs	feat(certificates): add support for custom CA certificates (#880 )	2023-11-01 20:10:14 +01:00
docs	feat(usage): track and visualise usage per API key (#9920 )	2026-05-21 16:34:02 +02:00
examples	docs: make examples repository link more prominent (#8895 )	2026-03-09 09:26:16 +01:00
gallery	chore(model-gallery): ⬆️ update checksum (#9910 )	2026-05-20 23:38:45 +02:00
internal	feat: cleanups, small enhancements	2023-07-04 18:58:19 +02:00
pkg	[utils] Fail immediately on extraction errors (#9926 )	2026-05-21 19:00:33 +02:00
prompt-templates	Requested Changes from GPT4ALL to Luna-AI-Llama2 (#1092 )	2023-09-22 11:22:17 +02:00
scripts	ci(bump-deps): register ds4 + move version pin into the Makefile (#9761 )	2026-05-11 22:46:02 +02:00
swagger	feat(swagger): update swagger (#9872 )	2026-05-20 22:05:35 +02:00
tests	feat: add ds4 backend (DeepSeek V4 Flash) with tool calls, thinking, KV cache (#9758 )	2026-05-11 22:15:47 +02:00
.air.toml	feat(ui): chat stats, small visual enhancements (#7223 )	2025-11-10 18:12:07 +01:00
.dockerignore	feat(whisper-cpp): Convert to Purego and add VAD (#6087 )	2025-08-28 17:25:18 +02:00
.editorconfig	feat(stores): Vector store backend (#1795 )	2024-03-22 21:14:04 +01:00
.env	feat(diffusers): add experimental support for sd_embed-style prompt embedding (#8504 )	2026-02-11 22:58:19 +01:00
.gitattributes	chore(linguist): add *.hpp files to linguist-vendored (#4154 )	2024-11-14 14:12:16 +01:00
.gitignore	fix(openai): stream usage non-zero when tools are enabled (#9941 )	2026-05-22 10:13:41 +02:00
.gitmodules	feat: Add Kokoros backend (#9212 )	2026-04-08 19:23:16 +02:00
.golangci.yml	feat(gallery): verify backend OCI images with keyless cosign (#9823 )	2026-05-18 08:02:20 +02:00
.goreleaser.yaml	feat(ui): move to React for frontend (#8772 )	2026-03-05 21:47:12 +01:00
.yamllint	fix: yamlint warnings and errors (#2131 )	2024-04-25 17:25:56 +00:00
AGENTS.md	feat(gallery): verify backend OCI images with keyless cosign (#9823 )	2026-05-18 08:02:20 +02:00
CLAUDE.md	fix(realtime): Add functions to conversation history (#8616 )	2026-02-21 19:03:49 +01:00
CONTRIBUTING.md	docs(agents): adopt kernel's AI coding assistants policy	2026-04-19 22:50:54 +00:00
docker-compose.distributed.yaml	fix(distributed): worker container healthcheck always unhealthy	2026-04-27 13:51:57 +00:00
docker-compose.yaml	fix(distributed): correct VRAM/RAM reporting on NVIDIA unified-memory hosts (#9545 )	2026-04-24 22:02:23 +02:00
Dockerfile	chore(deps): bump node from 25-slim to 26-slim (#9769 )	2026-05-12 09:19:51 +02:00
Entitlements.plist	Feat: OSX Local Codesigning (#1319 )	2023-11-23 15:22:54 +01:00
entrypoint.sh	feat: ⚠️ reduce images size and stop bundling sources (#5721 )	2025-06-26 18:41:38 +02:00
flake.lock	feat: add flake.nix for dockerless setup (#9851 )	2026-05-18 15:23:10 +01:00
flake.nix	fix(nix): correct flake src path and add dev shell (#9894 )	2026-05-19 19:28:30 +02:00
go.mod	refactor(agents): bump skillserver, drop redundant Name from list_skills output (#9916 )	2026-05-21 14:45:53 +02:00
go.sum	refactor(agents): bump skillserver, drop redundant Name from list_skills output (#9916 )	2026-05-21 14:45:53 +02:00
LICENSE	chore(docs): update license year	2025-02-15 18:17:15 +01:00
Makefile	feat(realtime): Add Liquid Audio s2s model and assistant mode on talk page (#9801 )	2026-05-13 21:57:27 +02:00
README.md	docs: credit the LocalAI maintainers team	2026-05-02 23:37:04 +00:00
renovate.json	ci: manually update deps	2023-05-04 15:01:29 +02:00
SECURITY.md	docs: clarify SECURITY.md version support table with specific ranges and EOL dates (#8861 )	2026-03-08 17:58:19 +01:00
webui_static.yaml	feat(ui): move to React for frontend (#8772 )	2026-03-05 21:47:12 +01:00

README.md

LocalAI is the open-source AI engine. Run any model - LLMs, vision, voice, image, video - on any hardware. No GPU required.

Drop-in API compatibility — OpenAI, Anthropic, ElevenLabs APIs
36+ backends — llama.cpp, vLLM, transformers, whisper, diffusers, MLX...
Any hardware — NVIDIA, AMD, Intel, Apple Silicon, Vulkan, or CPU-only
Multi-user ready — API key auth, user quotas, role-based access
Built-in AI agents — autonomous agents with tool use, RAG, MCP, and skills
Privacy-first — your data never leaves your infrastructure

Created by Ettore Di Giacinto and maintained by the LocalAI team.

📖 Documentation | 💬 Discord | 💻 Quickstart | 🖼️ Models | ❓FAQ

Guided tour

https://github.com/user-attachments/assets/08cbb692-57da-48f7-963d-2e7b43883c18

Click to see more!

Quickstart

macOS

Note: The DMG is not signed by Apple. After installing, run: sudo xattr -d com.apple.quarantine /Applications/LocalAI.app. See #6268 for details.

Containers (Docker, podman, ...)

Already ran LocalAI before? Use docker start -i local-ai to restart an existing container.

CPU only:

docker run -ti --name local-ai -p 8080:8080 localai/localai:latest

NVIDIA GPU:

# CUDA 13
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-gpu-nvidia-cuda-13

# CUDA 12
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-gpu-nvidia-cuda-12

# NVIDIA Jetson ARM64 (CUDA 12, for AGX Orin and similar)
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-nvidia-l4t-arm64

# NVIDIA Jetson ARM64 (CUDA 13, for DGX Spark)
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-nvidia-l4t-arm64-cuda-13

AMD GPU (ROCm):

docker run -ti --name local-ai -p 8080:8080 --device=/dev/kfd --device=/dev/dri --group-add=video localai/localai:latest-gpu-hipblas

Intel GPU (oneAPI):

docker run -ti --name local-ai -p 8080:8080 --device=/dev/dri/card1 --device=/dev/dri/renderD128 localai/localai:latest-gpu-intel

Vulkan GPU:

docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-gpu-vulkan

Loading models

# From the model gallery (see available models with `local-ai models list` or at https://models.localai.io)
local-ai run llama-3.2-1b-instruct:q4_k_m
# From Huggingface
local-ai run huggingface://TheBloke/phi-2-GGUF/phi-2.Q8_0.gguf
# From the Ollama OCI registry
local-ai run ollama://gemma:2b
# From a YAML config
local-ai run https://gist.githubusercontent.com/.../phi-2.yaml
# From a standard OCI registry (e.g., Docker Hub)
local-ai run oci://localai/phi-2:latest

Automatic Backend Detection: LocalAI automatically detects your GPU capabilities and downloads the appropriate backend. For advanced options, see GPU Acceleration.

For more details, see the Getting Started guide.

Latest News

April 2026: Voice recognition, Face recognition, identification & liveness detection, Ollama API compatibility, Video generation in stable-diffusion.ggml, Backend versioning with auto-upgrade, Pin models & load-on-demand toggle, Universal model importer, new backends: sglang, ik-llama-cpp, TurboQuant, sam.cpp, Kokoros, qwen3tts.cpp, tinygrad multimodal
March 2026: Agent management, New React UI, WebRTC, MLX-distributed via P2P and RDMA, MCP Apps, MCP Client-side
February 2026: Realtime API for audio-to-audio with tool calling, ACE-Step 1.5 support
January 2026: LocalAI 3.10.0 — Anthropic API support, Open Responses API, video & image generation (LTX-2), unified GPU backends, tool streaming, Moonshine, Pocket-TTS. Release notes
December 2025: Dynamic Memory Resource reclaimer, Automatic multi-GPU model fitting (llama.cpp), Vibevoice backend
November 2025: Import models via URL, Multiple chats and history
October 2025: Model Context Protocol (MCP) support for agentic capabilities
September 2025: New Launcher for macOS and Linux, extended backend support for Mac and Nvidia L4T, MLX-Audio, WAN 2.2
August 2025: MLX, MLX-VLM, Diffusers, llama.cpp now supported on Apple Silicon
July 2025: All backends migrated outside the main binary — lightweight, modular architecture

For older news and full release notes, see GitHub Releases and the News page.

Features

Text generation (llama.cpp, transformers, vllm ... and more)
Text to Audio
Audio to Text
Image generation
OpenAI-compatible tools API
Realtime API (Speech-to-speech)
Embeddings generation
Constrained grammars
Download models from Huggingface
Vision API
Object Detection
Reranker API
P2P Inferencing
Distributed Mode — Horizontal scaling with PostgreSQL + NATS
Model Context Protocol (MCP)
Built-in Agents — Autonomous AI agents with tool use, RAG, skills, SSE streaming, and Agent Hub
Backend Gallery — Install/remove backends on the fly via OCI images
Voice Activity Detection (Silero-VAD)
Integrated WebUI

Supported Backends & Acceleration

LocalAI supports 36+ backends including llama.cpp, vLLM, transformers, whisper.cpp, diffusers, MLX, MLX-VLM, and many more. Hardware acceleration is available for NVIDIA (CUDA 12/13), AMD (ROCm), Intel (oneAPI/SYCL), Apple Silicon (Metal), Vulkan, and NVIDIA Jetson (L4T). All backends can be installed on-the-fly from the Backend Gallery.

See the full Backend & Model Compatibility Table and GPU Acceleration guide.

Resources

Team

LocalAI is maintained by a small team of humans, together with the wider community of contributors.

Ettore Di Giacinto — original author and project lead
Richard Palethorpe — maintainer

A huge thank you to everyone who contributes code, reviews PRs, files issues, and helps users in Discord — LocalAI is a community-driven project and wouldn't exist without you. See the full contributors list.

Citation

If you utilize this repository, data in a downstream project, please consider citing it with:

@misc{localai,
  author = {Ettore Di Giacinto},
  title = {LocalAI: The free, Open source OpenAI alternative},
  year = {2023},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/go-skynet/LocalAI}},

Star history

License

LocalAI is a community-driven project created by Ettore Di Giacinto and maintained by the LocalAI team.

MIT - Author Ettore Di Giacinto mudler@localai.io

Acknowledgements

LocalAI couldn't have been built without the help of great software already available from the community. Thank you!

llama.cpp
https://github.com/tatsu-lab/stanford_alpaca
https://github.com/cornelk/llama-go for the initial ideas
https://github.com/antimatter15/alpaca.cpp
https://github.com/EdVince/Stable-Diffusion-NCNN
https://github.com/ggerganov/whisper.cpp
https://github.com/rhasspy/piper
exo for the MLX distributed auto-parallel sharding implementation

Contributors

This is a community project, a special thanks to our contributors!