LocalAI is the open-source AI engine. Run any model - LLMs, vision, voice, image, video - on any hardware. No GPU required.
Find a file
LocalAI [bot] 0b2ae3c6ca
Some checks are pending
build backend container images / generate-matrix (push) Waiting to run
build backend container images / backend-jobs-multiarch (push) Blocked by required conditions
build backend container images / backend-jobs-singlearch (push) Blocked by required conditions
build backend container images / backend-merge-jobs-multiarch (push) Blocked by required conditions
build backend container images / backend-merge-jobs-singlearch (push) Blocked by required conditions
build backend container images / backend-jobs-darwin (push) Blocked by required conditions
Build test / build-test (push) Waiting to run
Build test / launcher-build-darwin (push) Waiting to run
Build test / launcher-build-linux (push) Waiting to run
Explorer deployment / build-linux (push) Waiting to run
GPU tests / ubuntu-latest (1.21.x) (push) Waiting to run
generate and publish intel docker caches / generate_caches (intel/oneapi-basekit:2025.3.2-0-devel-ubuntu24.04, linux/amd64, arc-runner-set) (push) Waiting to run
build container images / core-image-build (ubuntu:24.04, vulkan, --jobs=4 --output-sync=target, arm64, linux/arm64, ubuntu-24.04-arm, false, auto, -gpu-vulkan, noble, 2404) (push) Waiting to run
build container images / core-image-merge (push) Blocked by required conditions
build container images / gpu-vulkan-image-merge (push) Blocked by required conditions
build container images / gpu-nvidia-cuda-12-image-merge (push) Blocked by required conditions
build container images / gpu-nvidia-cuda-13-image-merge (push) Blocked by required conditions
build container images / gpu-intel-image-merge (push) Blocked by required conditions
build container images / gpu-hipblas-image-merge (push) Blocked by required conditions
build container images / nvidia-l4t-arm64-image-merge (push) Blocked by required conditions
build container images / nvidia-l4t-arm64-cuda-13-image-merge (push) Blocked by required conditions
build container images / core-image-build (intel/oneapi-basekit:2025.3.2-0-devel-ubuntu24.04, intel, --jobs=3 --output-sync=target, linux/amd64, ubuntu-latest, auto, -gpu-intel, noble, 2404) (push) Waiting to run
build container images / core-image-build (ubuntu:22.04, cublas, 13, 0, --jobs=4 --output-sync=target, linux/amd64, ubuntu-latest, false, auto, -gpu-nvidia-cuda-13, noble, 2404) (push) Waiting to run
build container images / core-image-build (ubuntu:24.04, , --jobs=4 --output-sync=target, amd64, linux/amd64, ubuntu-latest, false, auto, , noble, 2404) (push) Waiting to run
build container images / core-image-build (ubuntu:24.04, , --jobs=4 --output-sync=target, arm64, linux/arm64, ubuntu-24.04-arm, false, auto, , noble, 2404) (push) Waiting to run
build container images / core-image-build (ubuntu:24.04, cublas, 12, 8, --jobs=4 --output-sync=target, linux/amd64, ubuntu-latest, false, auto, -gpu-nvidia-cuda-12, noble, 2404) (push) Waiting to run
build container images / core-image-build (ubuntu:24.04, vulkan, --jobs=4 --output-sync=target, amd64, linux/amd64, ubuntu-latest, false, auto, -gpu-vulkan, noble, 2404) (push) Waiting to run
build container images / hipblas-jobs (rocm/dev-ubuntu-24.04:7.2.1, hipblas, --jobs=3 --output-sync=target, linux/amd64, ubuntu-latest, auto, -gpu-hipblas, noble, 2404) (push) Waiting to run
build container images / gh-runner (nvcr.io/nvidia/l4t-jetpack:r36.4.0, cublas, 12, 0, --jobs=4 --output-sync=target, linux/arm64, ubuntu-24.04-arm, true, auto, -nvidia-l4t-arm64, jammy, 2204) (push) Waiting to run
build container images / gh-runner (ubuntu:24.04, cublas, 13, 0, --jobs=4 --output-sync=target, linux/arm64, ubuntu-24.04-arm, false, auto, -nvidia-l4t-arm64-cuda-13, noble, 2404) (push) Waiting to run
lint / golangci-lint (push) Waiting to run
Security Scan / tests (push) Waiting to run
Tests extras backends / detect-changes (push) Waiting to run
Tests extras backends / tests-ik-llama-cpp-grpc (push) Blocked by required conditions
Tests extras backends / tests-turboquant-grpc (push) Blocked by required conditions
Tests extras backends / tests-whisper-grpc-transcription (push) Blocked by required conditions
Tests extras backends / tests-transformers (push) Blocked by required conditions
Tests extras backends / tests-rerankers (push) Blocked by required conditions
Tests extras backends / tests-diffusers (push) Blocked by required conditions
Tests extras backends / tests-coqui (push) Blocked by required conditions
Tests extras backends / tests-moonshine (push) Blocked by required conditions
Tests extras backends / tests-pocket-tts (push) Blocked by required conditions
Tests extras backends / tests-qwen-tts (push) Blocked by required conditions
Tests extras backends / tests-qwen-asr (push) Blocked by required conditions
Tests extras backends / tests-nemo (push) Blocked by required conditions
Tests extras backends / tests-voxcpm (push) Blocked by required conditions
Tests extras backends / tests-liquid-audio (push) Blocked by required conditions
Tests extras backends / tests-sherpa-onnx-grpc-tts (push) Blocked by required conditions
Tests extras backends / tests-llama-cpp-quantization (push) Blocked by required conditions
Tests extras backends / tests-llama-cpp-grpc (push) Blocked by required conditions
Tests extras backends / tests-llama-cpp-grpc-transcription (push) Blocked by required conditions
Tests extras backends / tests-llama-cpp-smoke (push) Waiting to run
Tests extras backends / tests-sherpa-onnx-realtime (push) Blocked by required conditions
Tests extras backends / tests-sherpa-onnx-grpc-transcription (push) Blocked by required conditions
Tests extras backends / tests-acestep-cpp (push) Blocked by required conditions
Tests extras backends / tests-qwen3-tts-cpp (push) Blocked by required conditions
Tests extras backends / tests-vibevoice-cpp (push) Blocked by required conditions
Tests extras backends / tests-vibevoice-cpp-grpc-tts (push) Blocked by required conditions
Tests extras backends / tests-vibevoice-cpp-grpc-transcription (push) Blocked by required conditions
Tests extras backends / tests-localvqe-grpc-transform (push) Blocked by required conditions
Tests extras backends / tests-voxtral (push) Blocked by required conditions
Tests extras backends / tests-kokoros (push) Blocked by required conditions
Tests extras backends / tests-insightface-grpc (push) Blocked by required conditions
Tests extras backends / tests-speaker-recognition-grpc (push) Blocked by required conditions
tests / tests-linux (1.26.x) (push) Waiting to run
tests / tests-apple (1.26.x) (push) Waiting to run
tests-aio / tests-aio (push) Waiting to run
E2E Backend Tests / tests-e2e-backend (1.25.x) (push) Waiting to run
UI E2E Tests / tests-ui-e2e (1.26.x) (push) Waiting to run
fix(openai): stream usage non-zero when tools are enabled (#9941)
* chore: ignore local .worktrees directory

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(openai): stream usage non-zero when tools are enabled

The streaming chat-completions worker for tool-bearing requests
(processTools in core/http/endpoints/openai/chat.go) never forwarded the
cumulative TokenUsage from ComputeChoices to the chunks it placed on the
responses channel. The outer streaming loop's running usage tracker
therefore stayed at the zero value, and the include_usage trailer
reported {prompt_tokens:0, completion_tokens:0, total_tokens:0} whenever
the request carried a `tools` array. Without tools, the alternative
`process` path stamps Usage on every chunk, so that path was unaffected.

Forward the final TokenUsage via a usage-only sentinel chunk (empty
Choices, populated Usage) emitted right before close(responses). The
outer loop's per-chunk Usage capture moves above the empty-Choices skip
so the sentinel updates the tracker without ever reaching the wire,
keeping the existing OpenAI spec contract (intermediate chunks carry no
`usage` field, and the deferred-final-chunk helpers remain Usage-free
per the regression test for issue #8546).

Adds streamUsageFromTokenUsage, usageSentinelChunk, and
applyChunkToUsage helpers with focused Ginkgo coverage plus a flow-level
test that mirrors the outer-loop sequence.

Fixes #9927

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:opus-4-7 [Claude Code]

* refactor(openai): return final TokenUsage from stream workers

Replace the usage-only sentinel SSE chunk introduced in the previous
commit with a plain return value. The streaming workers process and
processTools (now extracted as package-level processStream and
processStreamWithTools) return (backend.TokenUsage, error); the outer
ChatEndpoint loop reads the cumulative counts off the existing `ended`
channel (now carrying streamWorkerResult{usage, err}) and builds the
include_usage trailer from a normal Go value after the LOOP exits.

This drops the empty-Choices "skip but capture Usage" rule from the
outer loop and removes the usageSentinelChunk / applyChunkToUsage
helpers entirely. The SSE responses channel is back to a single
purpose: wire chunks only.

processStream and processStreamWithTools move into chat_stream_workers.go
so they can be exercised directly from tests. The chat_stream_usage_test.go
suite now drives the workers with a mocked backend.ModelInferenceFunc
and asserts on the returned TokenUsage. The regression coverage for
issue #9927 is therefore behavioral: reverting the fix (discarding
ComputeChoices' usage return) makes the assertions fail with concrete
count mismatches.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:opus-4-7 [Claude Code]

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-05-22 10:13:41 +02:00
.agents feat(gallery): verify backend OCI images with keyless cosign (#9823) 2026-05-18 08:02:20 +02:00
.devcontainer fix: Add named volumes for Windows Docker compatibility (#8661) 2026-02-26 23:18:53 +01:00
.devcontainer-scripts feat: refactor build process, drop embedded backends (#5875) 2025-07-22 16:31:04 +02:00
.docker ci: refactor llama-cpp variant Dockerfiles to consume prebuilt base-grpc images (PR 2/2) (#9738) 2026-05-10 00:03:52 +02:00
.github ci(images): publish chronologically-orderable master-<epoch>-<sha> tags 2026-05-21 17:18:30 +00:00
.vscode feat: refactor build process, drop embedded backends (#5875) 2025-07-22 16:31:04 +02:00
backend chore: ⬆️ Update ggml-org/llama.cpp to bb28c1fe246b72276ee1d00ce89306be7b865766 (#9934) 2026-05-22 09:49:33 +02:00
cmd feat: Merge repeated log lines in the terminal (#9141) 2026-03-26 22:16:13 +01:00
configuration refactor: move remaining api packages to core (#1731) 2024-03-01 16:19:53 +01:00
core fix(openai): stream usage non-zero when tools are enabled (#9941) 2026-05-22 10:13:41 +02:00
custom-ca-certs feat(certificates): add support for custom CA certificates (#880) 2023-11-01 20:10:14 +01:00
docs feat(usage): track and visualise usage per API key (#9920) 2026-05-21 16:34:02 +02:00
examples docs: make examples repository link more prominent (#8895) 2026-03-09 09:26:16 +01:00
gallery chore(model-gallery): ⬆️ update checksum (#9910) 2026-05-20 23:38:45 +02:00
internal feat: cleanups, small enhancements 2023-07-04 18:58:19 +02:00
pkg [utils] Fail immediately on extraction errors (#9926) 2026-05-21 19:00:33 +02:00
prompt-templates Requested Changes from GPT4ALL to Luna-AI-Llama2 (#1092) 2023-09-22 11:22:17 +02:00
scripts ci(bump-deps): register ds4 + move version pin into the Makefile (#9761) 2026-05-11 22:46:02 +02:00
swagger feat(swagger): update swagger (#9872) 2026-05-20 22:05:35 +02:00
tests feat: add ds4 backend (DeepSeek V4 Flash) with tool calls, thinking, KV cache (#9758) 2026-05-11 22:15:47 +02:00
.air.toml feat(ui): chat stats, small visual enhancements (#7223) 2025-11-10 18:12:07 +01:00
.dockerignore feat(whisper-cpp): Convert to Purego and add VAD (#6087) 2025-08-28 17:25:18 +02:00
.editorconfig feat(stores): Vector store backend (#1795) 2024-03-22 21:14:04 +01:00
.env feat(diffusers): add experimental support for sd_embed-style prompt embedding (#8504) 2026-02-11 22:58:19 +01:00
.gitattributes chore(linguist): add *.hpp files to linguist-vendored (#4154) 2024-11-14 14:12:16 +01:00
.gitignore fix(openai): stream usage non-zero when tools are enabled (#9941) 2026-05-22 10:13:41 +02:00
.gitmodules feat: Add Kokoros backend (#9212) 2026-04-08 19:23:16 +02:00
.golangci.yml feat(gallery): verify backend OCI images with keyless cosign (#9823) 2026-05-18 08:02:20 +02:00
.goreleaser.yaml feat(ui): move to React for frontend (#8772) 2026-03-05 21:47:12 +01:00
.yamllint fix: yamlint warnings and errors (#2131) 2024-04-25 17:25:56 +00:00
AGENTS.md feat(gallery): verify backend OCI images with keyless cosign (#9823) 2026-05-18 08:02:20 +02:00
CLAUDE.md fix(realtime): Add functions to conversation history (#8616) 2026-02-21 19:03:49 +01:00
CONTRIBUTING.md docs(agents): adopt kernel's AI coding assistants policy 2026-04-19 22:50:54 +00:00
docker-compose.distributed.yaml fix(distributed): worker container healthcheck always unhealthy 2026-04-27 13:51:57 +00:00
docker-compose.yaml fix(distributed): correct VRAM/RAM reporting on NVIDIA unified-memory hosts (#9545) 2026-04-24 22:02:23 +02:00
Dockerfile chore(deps): bump node from 25-slim to 26-slim (#9769) 2026-05-12 09:19:51 +02:00
Entitlements.plist Feat: OSX Local Codesigning (#1319) 2023-11-23 15:22:54 +01:00
entrypoint.sh feat: ⚠️ reduce images size and stop bundling sources (#5721) 2025-06-26 18:41:38 +02:00
flake.lock feat: add flake.nix for dockerless setup (#9851) 2026-05-18 15:23:10 +01:00
flake.nix fix(nix): correct flake src path and add dev shell (#9894) 2026-05-19 19:28:30 +02:00
go.mod refactor(agents): bump skillserver, drop redundant Name from list_skills output (#9916) 2026-05-21 14:45:53 +02:00
go.sum refactor(agents): bump skillserver, drop redundant Name from list_skills output (#9916) 2026-05-21 14:45:53 +02:00
LICENSE chore(docs): update license year 2025-02-15 18:17:15 +01:00
Makefile feat(realtime): Add Liquid Audio s2s model and assistant mode on talk page (#9801) 2026-05-13 21:57:27 +02:00
README.md docs: credit the LocalAI maintainers team 2026-05-02 23:37:04 +00:00
renovate.json ci: manually update deps 2023-05-04 15:01:29 +02:00
SECURITY.md docs: clarify SECURITY.md version support table with specific ranges and EOL dates (#8861) 2026-03-08 17:58:19 +01:00
webui_static.yaml feat(ui): move to React for frontend (#8772) 2026-03-05 21:47:12 +01:00




LocalAI stars LocalAI License

Follow LocalAI_API Join LocalAI Discord Community

mudler%2FLocalAI | Trendshift

LocalAI is the open-source AI engine. Run any model - LLMs, vision, voice, image, video - on any hardware. No GPU required.

  • Drop-in API compatibility — OpenAI, Anthropic, ElevenLabs APIs
  • 36+ backends — llama.cpp, vLLM, transformers, whisper, diffusers, MLX...
  • Any hardware — NVIDIA, AMD, Intel, Apple Silicon, Vulkan, or CPU-only
  • Multi-user ready — API key auth, user quotas, role-based access
  • Built-in AI agents — autonomous agents with tool use, RAG, MCP, and skills
  • Privacy-first — your data never leaves your infrastructure

Created by Ettore Di Giacinto and maintained by the LocalAI team.

📖 Documentation | 💬 Discord | 💻 Quickstart | 🖼️ Models | FAQ

Guided tour

https://github.com/user-attachments/assets/08cbb692-57da-48f7-963d-2e7b43883c18

Click to see more!

User and auth

https://github.com/user-attachments/assets/228fa9ad-81a3-4d43-bfb9-31557e14a36c

Agents

https://github.com/user-attachments/assets/6270b331-e21d-4087-a540-6290006b381a

Usage metrics per user

https://github.com/user-attachments/assets/cbb03379-23b4-4e3d-bd26-d152f057007f

Fine-tuning and Quantization

https://github.com/user-attachments/assets/5ba4ace9-d3df-4795-b7d4-b0b404ea71ee

WebRTC

https://github.com/user-attachments/assets/ed88e34c-fed3-4b83-8a67-4716a9feeb7b

Quickstart

macOS

Download LocalAI for macOS

Note: The DMG is not signed by Apple. After installing, run: sudo xattr -d com.apple.quarantine /Applications/LocalAI.app. See #6268 for details.

Containers (Docker, podman, ...)

Already ran LocalAI before? Use docker start -i local-ai to restart an existing container.

CPU only:

docker run -ti --name local-ai -p 8080:8080 localai/localai:latest

NVIDIA GPU:

# CUDA 13
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-gpu-nvidia-cuda-13

# CUDA 12
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-gpu-nvidia-cuda-12

# NVIDIA Jetson ARM64 (CUDA 12, for AGX Orin and similar)
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-nvidia-l4t-arm64

# NVIDIA Jetson ARM64 (CUDA 13, for DGX Spark)
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-nvidia-l4t-arm64-cuda-13

AMD GPU (ROCm):

docker run -ti --name local-ai -p 8080:8080 --device=/dev/kfd --device=/dev/dri --group-add=video localai/localai:latest-gpu-hipblas

Intel GPU (oneAPI):

docker run -ti --name local-ai -p 8080:8080 --device=/dev/dri/card1 --device=/dev/dri/renderD128 localai/localai:latest-gpu-intel

Vulkan GPU:

docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-gpu-vulkan

Loading models

# From the model gallery (see available models with `local-ai models list` or at https://models.localai.io)
local-ai run llama-3.2-1b-instruct:q4_k_m
# From Huggingface
local-ai run huggingface://TheBloke/phi-2-GGUF/phi-2.Q8_0.gguf
# From the Ollama OCI registry
local-ai run ollama://gemma:2b
# From a YAML config
local-ai run https://gist.githubusercontent.com/.../phi-2.yaml
# From a standard OCI registry (e.g., Docker Hub)
local-ai run oci://localai/phi-2:latest

Automatic Backend Detection: LocalAI automatically detects your GPU capabilities and downloads the appropriate backend. For advanced options, see GPU Acceleration.

For more details, see the Getting Started guide.

Latest News

For older news and full release notes, see GitHub Releases and the News page.

Features

Supported Backends & Acceleration

LocalAI supports 36+ backends including llama.cpp, vLLM, transformers, whisper.cpp, diffusers, MLX, MLX-VLM, and many more. Hardware acceleration is available for NVIDIA (CUDA 12/13), AMD (ROCm), Intel (oneAPI/SYCL), Apple Silicon (Metal), Vulkan, and NVIDIA Jetson (L4T). All backends can be installed on-the-fly from the Backend Gallery.

See the full Backend & Model Compatibility Table and GPU Acceleration guide.

Resources

Team

LocalAI is maintained by a small team of humans, together with the wider community of contributors.

A huge thank you to everyone who contributes code, reviews PRs, files issues, and helps users in Discord — LocalAI is a community-driven project and wouldn't exist without you. See the full contributors list.

Citation

If you utilize this repository, data in a downstream project, please consider citing it with:

@misc{localai,
  author = {Ettore Di Giacinto},
  title = {LocalAI: The free, Open source OpenAI alternative},
  year = {2023},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/go-skynet/LocalAI}},

Sponsors

Do you find LocalAI useful?

Support the project by becoming a backer or sponsor. Your logo will show up here with a link to your website.

A huge thank you to our generous sponsors who support this project covering CI expenses, and our Sponsor list:


Individual sponsors

A special thanks to individual sponsors, a full list is on GitHub and buymeacoffee. Special shout out to drikster80 for being generous. Thank you everyone!

Star history

LocalAI Star history Chart

License

LocalAI is a community-driven project created by Ettore Di Giacinto and maintained by the LocalAI team.

MIT - Author Ettore Di Giacinto mudler@localai.io

Acknowledgements

LocalAI couldn't have been built without the help of great software already available from the community. Thank you!

Contributors

This is a community project, a special thanks to our contributors!