LocalAI is the open-source AI engine. Run any model - LLMs, vision, voice, image, video - on any hardware. No GPU required.
Find a file
LocalAI [bot] 19d59102d5
feat(whisper-cpp): implement streaming transcription (#9751)
* test(whisper): wire e2e streaming transcription target

Adds test-extra-backend-whisper-transcription, mirroring the existing
llama-cpp / sherpa-onnx / vibevoice-cpp targets. The generic
AudioTranscriptionStream spec at tests/e2e-backends/backend_test.go:644
fails today because backend/go/whisper has no streaming impl - this
target is the failing TDD gate that the next phase makes pass.

Confirmed RED locally: 3 Passed (health, load, offline transcription),
1 Failed (streaming spec hits its 300s context deadline because the
base implementation returns 'unimplemented' but doesn't close the
result channel, leaving the gRPC stream open until the client times
out).

Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(whisper-cpp): expose new_segment_callback to the Go side

Adds set_new_segment_callback() and a C-side trampoline that whisper.cpp
invokes once per new text segment during whisper_full(). The trampoline
dispatches (idx_first, n_new, user_data) to a Go function pointer
registered via purego.NewCallback - text and timings are pulled by Go
through the existing get_segment_text/get_segment_t0/get_segment_t1
getters.

Wires the hook only when streaming is actually requested, to avoid a
per-segment function-pointer dispatch on the offline path.

Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(whisper-cpp): implement AudioTranscriptionStream

Wires whisper.cpp's new_segment_callback through purego back to Go so
the streaming transcription RPC produces real, time-correlated deltas
while whisper_full() is still decoding. Each segment becomes one
TranscriptStreamResponse{Delta}; whisper_full's return is the
TranscriptStreamResponse{FinalResult} carrying the full segment list,
language, and duration.

Per-call state is tracked in a sync.Map keyed by an atomic counter; the
Go callback registered via purego.NewCallback is a singleton, dispatched
through user_data. SingleThread today means only one entry is ever live,
but the map shape matches the sherpa-onnx TTS callback pattern.

The streaming path's final.Text is the literal concat of every emitted
delta (a strings.Builder accumulated by onNewSegment) so the e2e
invariant `final.Text == concat(deltas)` holds exactly. The first delta
has no leading space; subsequent deltas are space-prefixed. The offline
AudioTranscription path is unchanged.

Closes the gap with sherpa-onnx, vibevoice-cpp, llama-cpp, and tinygrad,
which already implement AudioTranscriptionStream.

Verified GREEN locally: make test-extra-backend-whisper-transcription
passes 4/4 specs (3 Passed initially under RED, +1 streaming spec now).

Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* test(whisper-cpp): assert progressive multi-segment streaming

Drives AudioTranscriptionStream against a real long-audio fixture and
asserts len(deltas) >= 2. The generic e2e spec at
tests/e2e-backends/backend_test.go:644 only checks len(deltas) >= 1
which is satisfied by both real and faked streaming - this spec is the
guardrail that a future "fake" impl can't sneak past.

Skipped by default (env-gated, like the cancellation spec); set
WHISPER_LIBRARY, WHISPER_MODEL_PATH, and WHISPER_AUDIO_PATH to a 30+
second clip to run.

Verified locally with a 55s 5x-JFK concat against ggml-base.en.bin:
1 Passed in 7.3s, deltas >= 2, finalSegmentCount >= 2,
concat(deltas) == final.Text.

Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* ci(whisper-cpp): add transcription gRPC e2e job

Mirrors tests-sherpa-onnx-grpc-transcription /
tests-llama-cpp-grpc-transcription. Runs make
test-extra-backend-whisper-transcription whenever the whisper backend
or the run-all switch fires, so a pin-bump or refactor that breaks
streaming transcription gets caught before merge.

The whisper output on detect-changes is already emitted by
scripts/changed-backends.js (it iterates allBackendPaths); this PR
just exposes it as a workflow output and consumes it.

Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(whisper-cpp): silence errcheck on AudioTranscriptionStream defers

golangci-lint runs with new-from-merge-base=origin/master, so the
identical defer patterns in the existing offline AudioTranscription
path are grandfathered while the new ones in AudioTranscriptionStream
trip errcheck. Wrap both defers in `func() { _ = ... }()` to match what
errcheck wants without altering behavior. The errors from os.RemoveAll
and *os.File.Close are not actionable inside a defer here (we're
already returning), matching the offline path's contract.

Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-05-10 23:11:46 +02:00
.agents docs(ci-caching): list all paths that retrigger base-images.yml 2026-05-09 22:31:37 +00:00
.devcontainer fix: Add named volumes for Windows Docker compatibility (#8661) 2026-02-26 23:18:53 +01:00
.devcontainer-scripts feat: refactor build process, drop embedded backends (#5875) 2025-07-22 16:31:04 +02:00
.docker ci: refactor llama-cpp variant Dockerfiles to consume prebuilt base-grpc images (PR 2/2) (#9738) 2026-05-10 00:03:52 +02:00
.github feat(whisper-cpp): implement streaming transcription (#9751) 2026-05-10 23:11:46 +02:00
.vscode feat: refactor build process, drop embedded backends (#5875) 2025-07-22 16:31:04 +02:00
backend feat(whisper-cpp): implement streaming transcription (#9751) 2026-05-10 23:11:46 +02:00
cmd feat: Merge repeated log lines in the terminal (#9141) 2026-03-26 22:16:13 +01:00
configuration refactor: move remaining api packages to core (#1731) 2024-03-01 16:19:53 +01:00
core fix(gallery): keep auto-upgrade off non-dev backends when -development is installed (#9736) 2026-05-09 18:20:00 +02:00
custom-ca-certs feat(certificates): add support for custom CA certificates (#880) 2023-11-01 20:10:14 +01:00
docs feat(sglang): wire engine_args, add cuda13 build, ship MTP gallery demos (#9686) 2026-05-07 17:27:29 +02:00
examples docs: make examples repository link more prominent (#8895) 2026-03-09 09:26:16 +01:00
gallery chore(model gallery): 🤖 add 1 new models via gallery agent (#9720) 2026-05-08 16:26:03 +02:00
internal feat: cleanups, small enhancements 2023-07-04 18:58:19 +02:00
pkg chore: Security hardening (#9719) 2026-05-08 16:25:45 +02:00
prompt-templates Requested Changes from GPT4ALL to Luna-AI-Llama2 (#1092) 2023-09-22 11:22:17 +02:00
scripts ci: split backend-jobs into single-arch and multi-arch matrices (#9746) 2026-05-10 18:15:53 +02:00
swagger feat(swagger): update swagger (#9723) 2026-05-08 23:44:55 +02:00
tests fix(distributed): split NATS backend.upgrade off install + dedup loads (#9717) 2026-05-08 16:24:54 +02:00
.air.toml feat(ui): chat stats, small visual enhancements (#7223) 2025-11-10 18:12:07 +01:00
.dockerignore feat(whisper-cpp): Convert to Purego and add VAD (#6087) 2025-08-28 17:25:18 +02:00
.editorconfig feat(stores): Vector store backend (#1795) 2024-03-22 21:14:04 +01:00
.env feat(diffusers): add experimental support for sd_embed-style prompt embedding (#8504) 2026-02-11 22:58:19 +01:00
.gitattributes chore(linguist): add *.hpp files to linguist-vendored (#4154) 2024-11-14 14:12:16 +01:00
.gitignore fix(ui): Add tracing inline settings back and create UI tests (#9027) 2026-03-16 17:51:06 +01:00
.gitmodules feat: Add Kokoros backend (#9212) 2026-04-08 19:23:16 +02:00
.golangci.yml chore: add golangci-lint with new-from-merge-base baseline (#9603) 2026-04-28 22:07:44 +02:00
.goreleaser.yaml feat(ui): move to React for frontend (#8772) 2026-03-05 21:47:12 +01:00
.yamllint fix: yamlint warnings and errors (#2131) 2024-04-25 17:25:56 +00:00
AGENTS.md docs(agents): update CI caching docs after the GHA-free-tier migration (#9742) 2026-05-10 00:28:57 +02:00
CLAUDE.md fix(realtime): Add functions to conversation history (#8616) 2026-02-21 19:03:49 +01:00
CONTRIBUTING.md docs(agents): adopt kernel's AI coding assistants policy 2026-04-19 22:50:54 +00:00
docker-compose.distributed.yaml fix(distributed): worker container healthcheck always unhealthy 2026-04-27 13:51:57 +00:00
docker-compose.yaml fix(distributed): correct VRAM/RAM reporting on NVIDIA unified-memory hosts (#9545) 2026-04-24 22:02:23 +02:00
Dockerfile feat(ci): allow routing apt traffic through an alternate Ubuntu mirror (#9650) 2026-05-03 23:50:13 +02:00
Entitlements.plist Feat: OSX Local Codesigning (#1319) 2023-11-23 15:22:54 +01:00
entrypoint.sh feat: ⚠️ reduce images size and stop bundling sources (#5721) 2025-06-26 18:41:38 +02:00
go.mod chore: Security hardening (#9719) 2026-05-08 16:25:45 +02:00
go.sum chore: Security hardening (#9719) 2026-05-08 16:25:45 +02:00
LICENSE chore(docs): update license year 2025-02-15 18:17:15 +01:00
Makefile feat(whisper-cpp): implement streaming transcription (#9751) 2026-05-10 23:11:46 +02:00
README.md docs: credit the LocalAI maintainers team 2026-05-02 23:37:04 +00:00
renovate.json ci: manually update deps 2023-05-04 15:01:29 +02:00
SECURITY.md docs: clarify SECURITY.md version support table with specific ranges and EOL dates (#8861) 2026-03-08 17:58:19 +01:00
webui_static.yaml feat(ui): move to React for frontend (#8772) 2026-03-05 21:47:12 +01:00




LocalAI stars LocalAI License

Follow LocalAI_API Join LocalAI Discord Community

mudler%2FLocalAI | Trendshift

LocalAI is the open-source AI engine. Run any model - LLMs, vision, voice, image, video - on any hardware. No GPU required.

  • Drop-in API compatibility — OpenAI, Anthropic, ElevenLabs APIs
  • 36+ backends — llama.cpp, vLLM, transformers, whisper, diffusers, MLX...
  • Any hardware — NVIDIA, AMD, Intel, Apple Silicon, Vulkan, or CPU-only
  • Multi-user ready — API key auth, user quotas, role-based access
  • Built-in AI agents — autonomous agents with tool use, RAG, MCP, and skills
  • Privacy-first — your data never leaves your infrastructure

Created by Ettore Di Giacinto and maintained by the LocalAI team.

📖 Documentation | 💬 Discord | 💻 Quickstart | 🖼️ Models | FAQ

Guided tour

https://github.com/user-attachments/assets/08cbb692-57da-48f7-963d-2e7b43883c18

Click to see more!

User and auth

https://github.com/user-attachments/assets/228fa9ad-81a3-4d43-bfb9-31557e14a36c

Agents

https://github.com/user-attachments/assets/6270b331-e21d-4087-a540-6290006b381a

Usage metrics per user

https://github.com/user-attachments/assets/cbb03379-23b4-4e3d-bd26-d152f057007f

Fine-tuning and Quantization

https://github.com/user-attachments/assets/5ba4ace9-d3df-4795-b7d4-b0b404ea71ee

WebRTC

https://github.com/user-attachments/assets/ed88e34c-fed3-4b83-8a67-4716a9feeb7b

Quickstart

macOS

Download LocalAI for macOS

Note: The DMG is not signed by Apple. After installing, run: sudo xattr -d com.apple.quarantine /Applications/LocalAI.app. See #6268 for details.

Containers (Docker, podman, ...)

Already ran LocalAI before? Use docker start -i local-ai to restart an existing container.

CPU only:

docker run -ti --name local-ai -p 8080:8080 localai/localai:latest

NVIDIA GPU:

# CUDA 13
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-gpu-nvidia-cuda-13

# CUDA 12
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-gpu-nvidia-cuda-12

# NVIDIA Jetson ARM64 (CUDA 12, for AGX Orin and similar)
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-nvidia-l4t-arm64

# NVIDIA Jetson ARM64 (CUDA 13, for DGX Spark)
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-nvidia-l4t-arm64-cuda-13

AMD GPU (ROCm):

docker run -ti --name local-ai -p 8080:8080 --device=/dev/kfd --device=/dev/dri --group-add=video localai/localai:latest-gpu-hipblas

Intel GPU (oneAPI):

docker run -ti --name local-ai -p 8080:8080 --device=/dev/dri/card1 --device=/dev/dri/renderD128 localai/localai:latest-gpu-intel

Vulkan GPU:

docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-gpu-vulkan

Loading models

# From the model gallery (see available models with `local-ai models list` or at https://models.localai.io)
local-ai run llama-3.2-1b-instruct:q4_k_m
# From Huggingface
local-ai run huggingface://TheBloke/phi-2-GGUF/phi-2.Q8_0.gguf
# From the Ollama OCI registry
local-ai run ollama://gemma:2b
# From a YAML config
local-ai run https://gist.githubusercontent.com/.../phi-2.yaml
# From a standard OCI registry (e.g., Docker Hub)
local-ai run oci://localai/phi-2:latest

Automatic Backend Detection: LocalAI automatically detects your GPU capabilities and downloads the appropriate backend. For advanced options, see GPU Acceleration.

For more details, see the Getting Started guide.

Latest News

For older news and full release notes, see GitHub Releases and the News page.

Features

Supported Backends & Acceleration

LocalAI supports 36+ backends including llama.cpp, vLLM, transformers, whisper.cpp, diffusers, MLX, MLX-VLM, and many more. Hardware acceleration is available for NVIDIA (CUDA 12/13), AMD (ROCm), Intel (oneAPI/SYCL), Apple Silicon (Metal), Vulkan, and NVIDIA Jetson (L4T). All backends can be installed on-the-fly from the Backend Gallery.

See the full Backend & Model Compatibility Table and GPU Acceleration guide.

Resources

Team

LocalAI is maintained by a small team of humans, together with the wider community of contributors.

A huge thank you to everyone who contributes code, reviews PRs, files issues, and helps users in Discord — LocalAI is a community-driven project and wouldn't exist without you. See the full contributors list.

Citation

If you utilize this repository, data in a downstream project, please consider citing it with:

@misc{localai,
  author = {Ettore Di Giacinto},
  title = {LocalAI: The free, Open source OpenAI alternative},
  year = {2023},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/go-skynet/LocalAI}},

Sponsors

Do you find LocalAI useful?

Support the project by becoming a backer or sponsor. Your logo will show up here with a link to your website.

A huge thank you to our generous sponsors who support this project covering CI expenses, and our Sponsor list:


Individual sponsors

A special thanks to individual sponsors, a full list is on GitHub and buymeacoffee. Special shout out to drikster80 for being generous. Thank you everyone!

Star history

LocalAI Star history Chart

License

LocalAI is a community-driven project created by Ettore Di Giacinto and maintained by the LocalAI team.

MIT - Author Ettore Di Giacinto mudler@localai.io

Acknowledgements

LocalAI couldn't have been built without the help of great software already available from the community. Thank you!

Contributors

This is a community project, a special thanks to our contributors!