* Debug
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Drop openai video endpoint (is not complete)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Add download button
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(diffusers): add support to LTX-2
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Add to the gallery
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix: install only torch/torchvision from jetson index
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix: use pip for l4t-12
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Revert "fix: install only torch/torchvision from jetson index"
This reverts commit 2d2b020078
* chatterbox needs wheel
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(backends): add moonshine backend for faster transcription
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Add backend to CI, update AGENTS.md from this exercise
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(mlx): add thread-safe LRU prompt cache
Port mlx-lm's LRUPromptCache to fix race condition where concurrent
requests corrupt shared KV cache state. The previous implementation
used a single prompt_cache instance shared across all requests.
Changes:
- Add backend/python/common/mlx_cache.py with ThreadSafeLRUPromptCache
- Modify backend.py to use per-request cache isolation via fetch/insert
- Add prefix matching for cache reuse across similar prompts
- Add LRU eviction (default 10 entries, configurable)
- Add concurrency and cache unit tests
The cache uses a trie-based structure for efficient prefix matching,
allowing prompts that share common prefixes to reuse cached KV states.
Thread safety is provided via threading.Lock.
New configuration options:
- max_cache_entries: Maximum LRU cache entries (default: 10)
- max_kv_size: Maximum KV cache size per entry (default: None)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Signed-off-by: Blightbow <blightbow@users.noreply.github.com>
* feat(mlx): add min_p and top_k sampler support
Add MinP field to proto (field 52) following the precedent set by
other non-OpenAI sampling parameters like TopK, TailFreeSamplingZ,
TypicalP, and Mirostat.
Changes:
- backend.proto: Add float MinP field for min-p sampling
- backend.py: Extract and pass min_p and top_k to mlx_lm sampler
(top_k was in proto but not being passed)
- test.py: Fix test_sampling_params to use valid proto fields and
switch to MLX-compatible model (mlx-community/Llama-3.2-1B-Instruct)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Signed-off-by: Blightbow <blightbow@users.noreply.github.com>
* refactor(mlx): move mlx_cache.py from common to mlx backend
The ThreadSafeLRUPromptCache is only used by the mlx backend. After
evaluating mlx-vlm, it was determined that the cache cannot be shared
because mlx-vlm's generate/stream_generate functions don't support
the prompt_cache parameter that mlx_lm provides.
- Move mlx_cache.py from backend/python/common/ to backend/python/mlx/
- Remove sys.path manipulation from backend.py and test.py
- Fix test assertion to expect "MLX model loaded successfully"
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Signed-off-by: Blightbow <blightbow@users.noreply.github.com>
* test(mlx): add comprehensive cache tests and document upstream behavior
Added comprehensive unit tests (test_mlx_cache.py) covering all cache
operation modes:
- Exact match
- Shorter prefix match
- Longer prefix match with trimming
- No match scenarios
- LRU eviction and access order
- Reference counting and deep copy behavior
- Multi-model namespacing
- Thread safety with data integrity verification
Documents upstream mlx_lm/server.py behavior: single-token prefixes are
deliberately not matched (uses > 0, not >= 0) to allow longer cached
sequences to be preferred for trimming. This is acceptable because real
prompts with chat templates are always many tokens.
Removed weak unit tests from test.py that only verified "no exception
thrown" rather than correctness.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Signed-off-by: Blightbow <blightbow@users.noreply.github.com>
* chore(mlx): remove unused MinP proto field
The MinP field was added to PredictOptions but is not populated by the
Go frontend/API. The MLX backend uses getattr with a default value,
so it works without the proto field.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Signed-off-by: Blightbow <blightbow@users.noreply.github.com>
---------
Signed-off-by: Blightbow <blightbow@users.noreply.github.com>
Co-authored-by: Blightbow <blightbow@users.noreply.github.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
* feat(vibevoice): add backend
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore: add workflow and backend index
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore(gallery): add vibevoice
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Use self-hosted for intel builds
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Pin python version for l4t
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Initial plan
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Add dynamic loader for diffusers pipelines and refactor backend.py
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Fix pipeline discovery error handling and test mock issue
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Address code review feedback: direct imports, better error handling, improved tests
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Address remaining code review feedback: specific exceptions, registry access, test imports
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Add defensive fallback for DiffusionPipeline registry access
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Actually use dynamic pipeline loading for all pipelines in backend
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Use dynamic loader consistently for all pipelines including AutoPipelineForText2Image
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Move dynamic loader tests into test.py for CI compatibility
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Extend dynamic loader to discover any diffusers class type, not just DiffusionPipeline
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Add AutoPipeline classes to pipeline registry for default model loading
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix(python): set pyvenv python home
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* do pyenv update during start
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Minor changes
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
* chore(ci): add cuda13 jobs
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Add to pipelines and to capabilities. Start to work on the gallery
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* gallery
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* capabilities: try to detect by looking at /usr/local
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* neutts
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* backends.yaml
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* add cuda13 l4t requirements.txt
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* add cuda13 requirements.txt
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Fixups
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Fixups
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Pin vllm
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Not all backends are compatible
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* add vllm to requirements
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* vllm is not pre-compiled for cuda 13
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(neutts): add backend
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore(ci): add images to CI
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore(gallery): add Neutts
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Make it work with quantized versions
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Fixups
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Docs
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Fixups
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Apply suggestion from @mudler
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
* Apply suggestion from @mudler
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
* Apply suggestion from @mudler
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
* feat(chatterbox): support multilingual
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Add l4t support
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Fixups
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix: switch to fork
Until https://github.com/resemble-ai/chatterbox/pull/295 is merged
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
There was apparently an oversight, this fixes the float/int detection
Fixes: https://github.com/mudler/LocalAI/issues/6312
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
* feat(diffusers): add support for wan2.2
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore(ci): use ttl.sh for PRs
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Add ftfy deps
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Revert "chore(ci): use ttl.sh for PRs"
This reverts commit c9fc3ecf28.
* Simplify
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore: do not pin torch/torchvision on cuda12
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(mlx-audio): Add mlx-audio backend
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* improve loading
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* CI tests
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix: set C_INCLUDE_PATH to point to python install
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(backends): bundle python
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* test ci
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* vllm on self-hosted
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Add clang
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Try to fix it for Mac
* Relocate links only when is portable
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Make sure to call macosPortableEnv
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Use self-hosted for vllm
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Fixups
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* CI
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore: allow to install with pip
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* WIP
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Make the backend to build and actually work
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* List models from system only
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Add script to build darwin python backends
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Run protogen in libbackend
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Detect if mps is available across python backends
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* CI: try to build backend
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Debug CI
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Fixups
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Fixups
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Index mlx-vlm
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Remove mlx-vlm
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Drop CI test
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore(ci): add backend build tests
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore(deps): bump torch and diffusers
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore(ci): run diffusers/hipblas on self-hosted
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore(ci): do not publish darwin if building from PRs
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix(README): Add device flags for Intel/XPU
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* fix(diffusers/xpu): Set device to XPU and ignore CUDA request when on Intel
Signed-off-by: Richard Palethorpe <io@richiejp.com>
---------
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* fix: add python symlink, use absolute python env path when running backends
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix(ci): do not push images when building PRs
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat: Add backend gallery
This PR add support to manage backends as similar to models. There is
now available a backend gallery which can be used to install and remove
extra backends.
The backend gallery can be configured similarly as a model gallery, and
API calls allows to install and remove new backends in runtime, and as
well during the startup phase of LocalAI.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Add backends docs
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* wip: Backend Dockerfile for python backends
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat: drop extras images, build python backends separately
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fixup on all backends
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* test CI
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Tweaks
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Drop old backends leftovers
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Fixup CI
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Move dockerfile upper
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Fix proto
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Feature dropped for consistency - we prefer model galleries
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Add missing packages in the build image
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* exllama is ponly available on cublas
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* pin torch on chatterbox
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Fixups to index
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* CI
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Debug CI
* Install accellerators deps
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Add target arch
* Add cuda minor version
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Use self-hosted runners
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* ci: use quay for test images
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fixups for vllm and chatterbox
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Small fixups on CI
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chatterbox is only available for nvidia
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Simplify CI builds
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Adapt test, use qwen3
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* chore(model gallery): add jina-reranker-v1-tiny-en-gguf
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix(gguf-parser): recover from potential panics that can happen while reading ggufs with gguf-parser
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Use reranker from llama.cpp in AIO images
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Limit concurrent jobs
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
* working to address missing items
referencing #3436, #2930 - if i could test it, this might show that the
output from the vllm backend is processed and returned to the user
Signed-off-by: Wyatt Neal <wyatt.neal+git@gmail.com>
* adding in vllm tests to test-extras
Signed-off-by: Wyatt Neal <wyatt.neal+git@gmail.com>
* adding in tests to pipeline for execution
Signed-off-by: Wyatt Neal <wyatt.neal+git@gmail.com>
* removing todo block, test via pipeline
Signed-off-by: Wyatt Neal <wyatt.neal+git@gmail.com>
---------
Signed-off-by: Wyatt Neal <wyatt.neal+git@gmail.com>
Use the options field in the model to override kwargs if needed.
This allows to specify from the model yaml config:
```yaml
options:
- foo:bar
```
And each option will be used directly when calling the diffusers
pipeline, e.g:
```python
pipe(
foo="bar",
)
```
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
The project (MeloTTS) has been quite since long, newer backends are much
performant and better quality overall.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
We support at this point more extensive backends that are SOTA and
support also voice cloning, and many other features. This backend is
superseded and also poses significant maintenance burden as there is an
open issue https://github.com/mudler/LocalAI/issues/3941 which is still
open as it deps are pinning old versions of grpc.
Closes https://github.com/mudler/LocalAI/issues/3941
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* merge sentencetransformers
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Add alias to silently redirect sentencetransformers to transformers
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Add alias also for transformers-musicgen
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Drop from makefile
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Move tests from sentencetransformers
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Remove sentencetransformers
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Remove tests from CI (part of transformers)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Do not always try to load the tokenizer
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Adapt tests
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Fix typo
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* Tiny adjustments
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>