Commit graph

532 commits

Author SHA1 Message Date
Richard Palethorpe
b1aa707a92
fix(coqui,nemo,voxcpm): Add dependencies to allow CI to progress (#9142)
Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-03-26 18:03:56 +01:00
Ettore Di Giacinto
be25217955
chore(transformers): bump to >5.0 and generically load models (#9097)
* chore(transformers): bump to >5.0

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* chore: refactor to use generic model loading

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-03-22 00:57:54 +01:00
Ettore Di Giacinto
f7e8d9e791
feat(quantization): add quantization backend (#9096)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-03-22 00:56:34 +01:00
Ettore Di Giacinto
d9c1db2b87
feat: add (experimental) fine-tuning support with TRL (#9088)
* feat: add fine-tuning endpoint

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(experimental): add fine-tuning endpoint and TRL support

This changeset defines new GRPC signatues for Fine tuning backends, and
add TRL backend as initial fine-tuning engine. This implementation also
supports exporting to GGUF and automatically importing it to LocalAI
after fine-tuning.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* commit TRL backend, stop by killing process

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* move fine-tune to generic features

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* add evals, reorder menu

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Fix tests

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-03-21 02:08:02 +01:00
Ettore Di Giacinto
7dc691c171
feat: add fish-speech backend (#8962)
* feat: add fish-speech backend

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* drop portaudio

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-03-12 07:48:23 +01:00
Attila Györffy
5a67b5d73c
Fix image upload processing and img2img pipeline in diffusers backend (#8879)
* fix: add missing bufio.Flush in processImageFile

The processImageFile function writes decoded image data (from base64
or URL download) through a bufio.NewWriter but never calls Flush()
before closing the underlying file. Since bufio's default buffer is
4096 bytes, small images produce 0-byte files and large images are
truncated — causing PIL to fail with "cannot identify image file".

This breaks all image input paths: file, files, and ref_images
parameters in /v1/images/generations, making img2img, inpainting,
and reference image features non-functional.

Signed-off-by: Attila Györffy <attila+git@attilagyorffy.com>

* fix: merge options into kwargs in diffusers GenerateImage

The GenerateImage method builds a local `options` dict containing the
source image (PIL), negative_prompt, and num_inference_steps, but
never merges it into `kwargs` before calling self.pipe(**kwargs).
This causes img2img to fail with "Input is in incorrect format"
because the pipeline never receives the image parameter.

Signed-off-by: Attila Györffy <attila+git@attilagyorffy.com>

* test: add unit test for processImageFile base64 decoding

Verifies that a base64-encoded PNG survives the write path
(encode → decode → bufio.Write → Flush → file on disk) with
byte-for-byte fidelity. The test image is small enough to fit
entirely in bufio's 4096-byte buffer, which is the exact scenario
where the missing Flush() produced a 0-byte file.

Also tests that invalid base64 input is handled gracefully.

Signed-off-by: Attila Györffy <attila+git@attilagyorffy.com>

* test: verify GenerateImage merges options into pipeline kwargs

Mocks the diffusers pipeline and calls GenerateImage with a source
image and negative prompt. Asserts that the pipeline receives the
image, negative_prompt, and num_inference_steps via kwargs — the
exact parameters that were silently dropped before the fix.

Signed-off-by: Attila Györffy <attila+git@attilagyorffy.com>

* fix: move kwargs.update(options) earlier in GenerateImage

Move the options merge right after self.options merge (L742) so that
image, negative_prompt, and num_inference_steps are available to all
downstream code paths including img2vid and txt2vid.

Signed-off-by: Attila Györffy <attila+git@attilagyorffy.com>

* test: convert processImageFile tests to ginkgo

Replace standard testing with ginkgo/gomega to be consistent with
the rest of the test suites in the project.

Signed-off-by: Attila Györffy <attila+git@attilagyorffy.com>

---------

Signed-off-by: Attila Györffy <attila+git@attilagyorffy.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-03-11 08:05:50 +01:00
Ettore Di Giacinto
a026277ab9
feat(mlx-distributed): add new MLX-distributed backend (#8801)
* feat(mlx-distributed): add new MLX-distributed backend

Add new MLX distributed backend with support for both TCP and RDMA for
model sharding.

This implementation ties in the discovery implementation already in
place, and re-uses the same P2P mechanism for the TCP MLX-distributed
inferencing.

The Auto-parallel implementation is inspired by Exo's
ones (who have been added to acknowledgement for the great work!)

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* expose a CLI to facilitate backend starting

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat: make manual rank0 configurable via model configs

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Add missing features from mlx backend

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Apply suggestion from @mudler

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-03-09 17:29:32 +01:00
Weathercold
f347495de9
fix(qwen-tts): duplicate instruct argument in voice design mode (#8842)
Don't pass instruct because it is added to kwargs

Fixes the error `qwen_tts.inference.qwen3_tts_model.Qwen3TTSModel.generate_voice_design() got multiple values for keyword argument 'instruct'`

Signed-off-by: Weathercold <weathercold.scr@proton.me>
2026-03-08 08:48:22 +01:00
Andres
454d8adc76
feat(qwen-tts): Support using multiple voices (#8757)
* Add support for multiple voice clones in Qwen TTS

Signed-off-by: Andres Smith <andressmithdev@pm.me>

* Add voice prompt caching and generation logs to see generation time

---------

Signed-off-by: Andres Smith <andressmithdev@pm.me>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-03-04 09:47:21 +01:00
Ettore Di Giacinto
1c8db3846d
chore(faster-qwen3-tts): Add anyio to requirements.txt
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-03-03 09:43:29 +01:00
LocalAI [bot]
2dd4e7cdc3
fix(qwen-tts): ensure all requirements files end with newline (#8724)
- Add trailing newline to all requirements*.txt files in qwen-tts backend
- This ensures proper file formatting and prevents potential issues with
  package installation tools that expect newline-terminated files
2026-03-02 13:56:11 +01:00
LocalAI [bot]
8b430c577b
feat: Add debug logging for pocket-tts voice issue #8244 (#8715)
Adding debug logging to help investigate the pocket-tts custom voice
finding issue (Issue #8244). This is a first step to understand how
voices are being loaded and where the failure occurs.

Signed-off-by: localai-bot <localai-bot@users.noreply.github.com>
Co-authored-by: localai-bot <localai-bot@users.noreply.github.com>
2026-03-02 09:24:59 +01:00
LocalAI [bot]
dfc6efb88d
feat(backends): add faster-qwen3-tts (#8664)
* feat(backends): add faster-qwen3-tts

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix: this backend is CUDA only

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix: add requirements-install.txt with setuptools for build isolation

The faster-qwen3-tts backend requires setuptools to build packages
like sox that have setuptools as a build dependency. This ensures
the build completes successfully in CI.

Signed-off-by: LocalAI Bot <localai-bot@users.noreply.github.com>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Signed-off-by: LocalAI Bot <localai-bot@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-02-27 08:16:51 +01:00
dependabot[bot]
c4783a0a05
chore(deps): bump grpcio from 1.76.0 to 1.78.1 in /backend/python/vllm (#8635)
Bumps [grpcio](https://github.com/grpc/grpc) from 1.76.0 to 1.78.1.
- [Release notes](https://github.com/grpc/grpc/releases)
- [Commits](https://github.com/grpc/grpc/compare/v1.76.0...v1.78.1)

---
updated-dependencies:
- dependency-name: grpcio
  dependency-version: 1.78.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-02-25 08:17:32 +01:00
dependabot[bot]
c44f03b882
chore(deps): bump grpcio from 1.76.0 to 1.78.1 in /backend/python/rerankers (#8636)
chore(deps): bump grpcio in /backend/python/rerankers

Bumps [grpcio](https://github.com/grpc/grpc) from 1.76.0 to 1.78.1.
- [Release notes](https://github.com/grpc/grpc/releases)
- [Commits](https://github.com/grpc/grpc/compare/v1.76.0...v1.78.1)

---
updated-dependencies:
- dependency-name: grpcio
  dependency-version: 1.78.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-02-25 08:16:57 +01:00
dependabot[bot]
eeec92af78
chore(deps): bump sentence-transformers from 5.2.2 to 5.2.3 in /backend/python/transformers (#8638)
chore(deps): bump sentence-transformers in /backend/python/transformers

Bumps [sentence-transformers](https://github.com/huggingface/sentence-transformers) from 5.2.2 to 5.2.3.
- [Release notes](https://github.com/huggingface/sentence-transformers/releases)
- [Commits](https://github.com/huggingface/sentence-transformers/compare/v5.2.2...v5.2.3)

---
updated-dependencies:
- dependency-name: sentence-transformers
  dependency-version: 5.2.3
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-02-25 08:16:41 +01:00
dependabot[bot]
842033b8b5
chore(deps): bump grpcio from 1.76.0 to 1.78.1 in /backend/python/transformers (#8640)
chore(deps): bump grpcio in /backend/python/transformers

Bumps [grpcio](https://github.com/grpc/grpc) from 1.76.0 to 1.78.1.
- [Release notes](https://github.com/grpc/grpc/releases)
- [Commits](https://github.com/grpc/grpc/compare/v1.76.0...v1.78.1)

---
updated-dependencies:
- dependency-name: grpcio
  dependency-version: 1.78.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-02-25 08:14:55 +01:00
dependabot[bot]
a2941228a7
chore(deps): bump grpcio from 1.76.0 to 1.78.1 in /backend/python/common/template (#8641)
chore(deps): bump grpcio in /backend/python/common/template

Bumps [grpcio](https://github.com/grpc/grpc) from 1.76.0 to 1.78.1.
- [Release notes](https://github.com/grpc/grpc/releases)
- [Commits](https://github.com/grpc/grpc/compare/v1.76.0...v1.78.1)

---
updated-dependencies:
- dependency-name: grpcio
  dependency-version: 1.78.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-02-25 08:14:43 +01:00
dependabot[bot]
791e6b84ee
chore(deps): bump grpcio from 1.76.0 to 1.78.1 in /backend/python/coqui (#8642)
Bumps [grpcio](https://github.com/grpc/grpc) from 1.76.0 to 1.78.1.
- [Release notes](https://github.com/grpc/grpc/releases)
- [Commits](https://github.com/grpc/grpc/compare/v1.76.0...v1.78.1)

---
updated-dependencies:
- dependency-name: grpcio
  dependency-version: 1.78.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-02-25 08:14:30 +01:00
LocalAI [bot]
e555057f8b
fix: multi-GPU support for Diffusers (Issue #8575) (#8605)
* chore: init

* feat: implement multi-GPU support for Diffusers backend (fixes #8575)

---------

Co-authored-by: localai-bot <localai-bot@users.noreply.github.com>
2026-02-19 21:35:58 +01:00
Ettore Di Giacinto
dadc7158fb
fix(diffusers): sd_embed is not always available (#8602)
Seems sd_embed doesn't play well with MPS and L4T. Making it optional

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-02-19 10:45:17 +01:00
LocalAI [bot]
94df096fb9
fix: pin neutts-air to known working commit (#8566)
* chore: init

* fix: pin neutts-air to known working commit

---------

Co-authored-by: localai-bot <localai-bot@users.noreply.github.com>
2026-02-14 21:16:37 +01:00
Ettore Di Giacinto
820bd7dd01 fix(ci): try to fix deps for l4t13 on qwen-*
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-02-14 10:21:23 +01:00
Ettore Di Giacinto
2fb9940b8a
fix(voxcpm): pin setuptools (#8556)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-02-13 23:44:35 +01:00
Ettore Di Giacinto
2fd026e958
fix: update moonshine API, add setuptools to voxcpm requirements (#8541)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-02-12 23:22:37 +01:00
Austen
cff972094c
feat(diffusers): add experimental support for sd_embed-style prompt embedding (#8504)
* add experimental support for sd_embed-style prompt embedding

Signed-off-by: Austen Dicken <cvpcsm@gmail.com>

* add doc equivalent to compel

Signed-off-by: Austen Dicken <cvpcsm@gmail.com>

* need to use flux1 embedding function for flux model

Signed-off-by: Austen Dicken <cvpcsm@gmail.com>

---------

Signed-off-by: Austen Dicken <cvpcsm@gmail.com>
2026-02-11 22:58:19 +01:00
Ettore Di Giacinto
3370d807c2
feat(nemo): add Nemo (only asr for now) backend (#8436)
* feat(nemo): add Nemo (only asr for now) backend

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(nemo): add Nemo backend without Python version pins (#8438)

* Initial plan

* Remove Python version pins from nemo backend install.sh

Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>

* Pin pyarrow to 20.0.0 in nemo requirements

Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-02-07 08:19:37 +01:00
Richard Palethorpe
15c12674b6
fix(qwen-asr): Remove contagious slop (DEFAULT_GOAL) from Makefile (#8431)
Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-02-06 17:12:45 +01:00
Ettore Di Giacinto
218d0526cb fix(qwen-tts): add six dependency
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-02-05 18:05:31 +01:00
Ettore Di Giacinto
9bc5ab18fa fix(voxcpm): make sed call unix-compliant
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-02-05 17:15:58 +01:00
Ettore Di Giacinto
7aea2add44
Revert "chore(deps): bump torch from 2.4.1 to 2.7.1+xpu in /backend/python/rerankers in the pip group across 1 directory" (#8412)
Revert "chore(deps): bump torch from 2.4.1 to 2.7.1+xpu in /backend/python/re…"

This reverts commit 55e43b3f92.
2026-02-05 14:17:33 +01:00
dependabot[bot]
55e43b3f92
chore(deps): bump torch from 2.4.1 to 2.7.1+xpu in /backend/python/rerankers in the pip group across 1 directory (#8407)
chore(deps): bump torch

Bumps the pip group with 1 update in the /backend/python/rerankers directory: torch.


Updates `torch` from 2.4.1 to 2.7.1+xpu

---
updated-dependencies:
- dependency-name: torch
  dependency-version: 2.7.1+xpu
  dependency-type: direct:production
  dependency-group: pip
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-02-05 12:37:52 +00:00
Ettore Di Giacinto
53276d28e7
feat(musicgen): add ace-step and UI interface (#8396)
* feat(musicgen): add ace-step and UI interface

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Correctly handle model dir

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Drop auto-download

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Fixups

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Add to models, fixup UIs icons

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fixups

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Update docs

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* l4t13 is incompatbile

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* avoid pinning version for cuda12

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Drop l4t12

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-02-05 12:04:53 +01:00
Ettore Di Giacinto
9db4df22f3
chore: update torch and torchaudio version specifications for qwen-tts in MPS
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-02-04 16:55:42 +01:00
Ettore Di Giacinto
5201b58d3e
feat(mlx): Add support for CUDA12, CUDA13, L4T, SBSA and CPU (#8380)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-02-03 23:53:34 +01:00
Ettore Di Giacinto
3039ced287
chore(ci): enlarge sleep startup time
Even if suboptimal as we should poll to wait for the service to be available, this should at least alleviate tests for now

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-02-03 22:07:07 +01:00
Ettore Di Giacinto
e7fc604dbc
feat(metal): try to extend support to remaining backends (#8374)
* feat(metal): try to extend support to remaining backends

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* neutts doesn't work

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* split outetts out of transformers

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Remove torch pin to whisperx

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-02-03 21:57:50 +01:00
dependabot[bot]
5d14c2fe4d
chore(deps): bump sentence-transformers from 5.2.0 to 5.2.2 in /backend/python/transformers (#8358)
chore(deps): bump sentence-transformers in /backend/python/transformers

Bumps [sentence-transformers](https://github.com/huggingface/sentence-transformers) from 5.2.0 to 5.2.2.
- [Release notes](https://github.com/huggingface/sentence-transformers/releases)
- [Commits](https://github.com/huggingface/sentence-transformers/compare/v5.2.0...v5.2.2)

---
updated-dependencies:
- dependency-name: sentence-transformers
  dependency-version: 5.2.2
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-02-03 08:37:05 +01:00
Ettore Di Giacinto
d6409bd2eb
Revert "chore(deps): bump torch from 2.7.0 to 2.7.1+xpu in /backend/python/vllm in the pip group across 1 directory" (#8367)
Revert "chore(deps): bump torch from 2.7.0 to 2.7.1+xpu in /backend/python/vl…"

This reverts commit 4c0e70086d.
2026-02-03 08:34:54 +01:00
dependabot[bot]
98872791e5
chore(deps): bump protobuf from 6.33.4 to 6.33.5 in /backend/python/transformers (#8356)
chore(deps): bump protobuf in /backend/python/transformers

Bumps [protobuf](https://github.com/protocolbuffers/protobuf) from 6.33.4 to 6.33.5.
- [Release notes](https://github.com/protocolbuffers/protobuf/releases)
- [Commits](https://github.com/protocolbuffers/protobuf/commits)

---
updated-dependencies:
- dependency-name: protobuf
  dependency-version: 6.33.5
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-02-03 08:33:45 +01:00
dependabot[bot]
4c0e70086d
chore(deps): bump torch from 2.7.0 to 2.7.1+xpu in /backend/python/vllm in the pip group across 1 directory (#8360)
chore(deps): bump torch

Bumps the pip group with 1 update in the /backend/python/vllm directory: torch.


Updates `torch` from 2.7.0 to 2.7.1+xpu

---
updated-dependencies:
- dependency-name: torch
  dependency-version: 2.7.1+xpu
  dependency-type: direct:production
  dependency-group: pip
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-02-03 03:07:02 +00:00
Ettore Di Giacinto
08b2b8d755 fix(libbackend): do not inject --index-strategy unsafe-best-match to all
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-02-02 17:13:45 +01:00
Dream
10a1e6c74d
feat(whisperx): add whisperx backend for transcription with speaker diarization (#8299)
* feat(proto): add speaker field to TranscriptSegment for diarization

Add speaker field to the gRPC TranscriptSegment message and map it
through the Go schema, enabling backends to return speaker labels.

Signed-off-by: eureka928 <meobius123@gmail.com>

* feat(whisperx): add whisperx backend for transcription with diarization

Add Python gRPC backend using WhisperX for speech-to-text with
word-level timestamps, forced alignment, and speaker diarization
via pyannote-audio when HF_TOKEN is provided.

Signed-off-by: eureka928 <meobius123@gmail.com>

* feat(whisperx): register whisperx backend in Makefile

Signed-off-by: eureka928 <meobius123@gmail.com>

* feat(whisperx): add whisperx meta and image entries to index.yaml

Signed-off-by: eureka928 <meobius123@gmail.com>

* ci(whisperx): add build matrix entries for CPU, CUDA 12/13, and ROCm

Signed-off-by: eureka928 <meobius123@gmail.com>

* fix(whisperx): unpin torch versions and use CPU index for cpu requirements

Address review feedback:
- Use --extra-index-url for CPU torch wheels to reduce size
- Remove torch version pins, let uv resolve compatible versions

Signed-off-by: eureka928 <meobius123@gmail.com>

* fix(whisperx): pin torch ROCm variant to fix CI build failure

Signed-off-by: eureka928 <meobius123@gmail.com>

* fix(whisperx): pin torch CPU variant to fix uv resolution failure

Pin torch==2.8.0+cpu so uv resolves the CPU wheel from the extra
index instead of picking torch==2.8.0+cu128 from PyPI, which pulls
unresolvable CUDA dependencies.

Signed-off-by: eureka928 <meobius123@gmail.com>

* fix(whisperx): use unsafe-best-match index strategy to fix uv resolution failure

uv's default first-match strategy finds torch on PyPI before checking
the extra index, causing it to pick torch==2.8.0+cu128 instead of the
CPU variant. This makes whisperx's transitive torch dependency
unresolvable. Using unsafe-best-match lets uv consider all indexes.

Signed-off-by: eureka928 <meobius123@gmail.com>

* fix(whisperx): drop +cpu local version suffix to fix uv resolution failure

PEP 440 ==2.8.0 matches 2.8.0+cpu from the extra index, avoiding the
issue where uv cannot locate an explicit +cpu local version specifier.
This aligns with the pattern used by all other CPU backends.

Signed-off-by: eureka928 <meobius123@gmail.com>

* fix(backends): drop +rocm local version suffixes from hipblas requirements to fix uv resolution

uv cannot resolve PEP 440 local version specifiers (e.g. +rocm6.4,
+rocm6.3) in pinned requirements. The --extra-index-url already points
to the correct ROCm wheel index and --index-strategy unsafe-best-match
(set in libbackend.sh) ensures the ROCm variant is preferred.

Applies the same fix as 7f5d72e8 (which resolved this for +cpu) across
all 14 hipblas requirements files.

Signed-off-by: eureka928 <meobius123@gmail.com>

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Signed-off-by: eureka928 <meobius123@gmail.com>

* revert: scope hipblas suffix fix to whisperx only

Reverts changes to non-whisperx hipblas requirements files per
maintainer review — other backends are building fine with the +rocm
local version suffix.

Signed-off-by: eureka928 <meobius123@gmail.com>

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Signed-off-by: eureka928 <meobius123@gmail.com>

---------

Signed-off-by: eureka928 <meobius123@gmail.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-02 16:33:12 +01:00
Andres
b6459ddd57
feat(api): Add transcribe response format request parameter & adjust STT backends (#8318)
* WIP response format implementation for audio transcriptions

(cherry picked from commit e271dd764bbc13846accf3beb8b6522153aa276f)
Signed-off-by: Andres Smith <andressmithdev@pm.me>

* Rework transcript response_format and add more formats

(cherry picked from commit 6a93a8f63e2ee5726bca2980b0c9cf4ef8b7aeb8)
Signed-off-by: Andres Smith <andressmithdev@pm.me>

* Add test and replace go-openai package with official openai go client

(cherry picked from commit f25d1a04e46526429c89db4c739e1e65942ca893)
Signed-off-by: Andres Smith <andressmithdev@pm.me>

* Fix faster-whisper backend and refactor transcription formatting to also work on CLI

Signed-off-by: Andres Smith <andressmithdev@pm.me>
(cherry picked from commit 69a93977d5e113eb7172bd85a0f918592d3d2168)
Signed-off-by: Andres Smith <andressmithdev@pm.me>

---------

Signed-off-by: Andres Smith <andressmithdev@pm.me>
Co-authored-by: nanoandrew4 <nanoandrew4@gmail.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-02-01 17:33:17 +01:00
Ettore Di Giacinto
68dd9765a0
feat(tts): add support for streaming mode (#8291)
* feat(tts): add support for streaming mode

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Send first audio, make sure it's 16

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-30 11:58:01 +01:00
Ettore Di Giacinto
1e08e02598
feat(qwen-asr): add support to qwen-asr (#8281)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-29 21:50:35 +01:00
Ettore Di Giacinto
9b973b79f6
feat: add VoxCPM tts backend (#8109)
* feat: add VoxCPM tts backend

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Disable voxcpm on arm64 cpu

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-28 14:44:04 +01:00
Ettore Di Giacinto
ec1598868b
feat(vibevoice): add ASR support (#8222)
* feat(vibevoice): add ASR support

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Add tests

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fixups

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* chore(tests): download voice files

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Small fixups

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Small fixups

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Try to run on bigger runner

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Fixups

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Fixups

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Fixups

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* debug

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* CI can't hold vibevoice

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-27 20:19:22 +01:00
Ettore Di Giacinto
26a374b717
chore: drop bark which is unmaintained (#8207)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-25 09:26:40 +01:00
Ettore Di Giacinto
b2a8a63899
feat(vllm-omni): add new backend (#8188)
* feat(vllm-omni: add new backend

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* default to py3.12

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-24 22:23:30 +01:00
Ettore Di Giacinto
05904c77f5
chore(exllama): drop backend now almost deprecated (#8186)
exllama2 development has stalled and only old architectures are
supported. exllamav3 is still in development, meanwhile cleaning up
exllama2 from the gallery.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-24 08:57:37 +01:00
Ettore Di Giacinto
58bb6a29ed
Revert "chore(deps): bump torch from 2.4.1 to 2.7.1+xpu in /backend/python/bark in the pip group across 1 directory" (#8180)
Revert "chore(deps): bump torch from 2.4.1 to 2.7.1+xpu in /backend/python/ba…"

This reverts commit 5881c82413.
2026-01-23 17:25:04 +01:00
dependabot[bot]
5881c82413
chore(deps): bump torch from 2.4.1 to 2.7.1+xpu in /backend/python/bark in the pip group across 1 directory (#8175)
chore(deps): bump torch

Bumps the pip group with 1 update in the /backend/python/bark directory: torch.


Updates `torch` from 2.4.1 to 2.7.1+xpu

---
updated-dependencies:
- dependency-name: torch
  dependency-version: 2.7.1+xpu
  dependency-type: direct:production
  dependency-group: pip
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-01-23 15:32:15 +00:00
Ettore Di Giacinto
923ebbb344
feat(qwen-tts): add Qwen-tts backend (#8163)
* feat(qwen-tts): add Qwen-tts backend

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Update intel deps

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Drop flash-attn for cuda13

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-23 15:18:41 +01:00
Ettore Di Giacinto
0fa0ac4797
fix(videogen): drop incomplete endpoint, add GGUF support for LTX-2 (#8160)
* Debug

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Drop openai video endpoint (is not complete)

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Add download button

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-22 14:09:20 +01:00
Ettore Di Giacinto
22c0eb5421
chore(diffusers): add 'av' to requirements.txt (#8155)
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-01-21 22:35:00 +01:00
Ettore Di Giacinto
d16722ee13
Revert "chore(deps): bump torch from 2.3.1+cxx11.abi to 2.8.0 in /backend/python/rerankers in the pip group across 1 directory" (#8072)
Revert "chore(deps): bump torch from 2.3.1+cxx11.abi to 2.8.0 in /backend/pyt…"

This reverts commit 1f10ab39a9.
2026-01-16 20:50:33 +01:00
dependabot[bot]
1f10ab39a9
chore(deps): bump torch from 2.3.1+cxx11.abi to 2.8.0 in /backend/python/rerankers in the pip group across 1 directory (#8066)
chore(deps): bump torch

Bumps the pip group with 1 update in the /backend/python/rerankers directory: [torch](https://github.com/pytorch/pytorch).


Updates `torch` from 2.3.1+cxx11.abi to 2.8.0
- [Release notes](https://github.com/pytorch/pytorch/releases)
- [Changelog](https://github.com/pytorch/pytorch/blob/main/RELEASE.md)
- [Commits](https://github.com/pytorch/pytorch/commits/v2.8.0)

---
updated-dependencies:
- dependency-name: torch
  dependency-version: 2.8.0
  dependency-type: direct:production
  dependency-group: pip
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-01-16 19:38:12 +00:00
Ettore Di Giacinto
b19afc9e64
feat(diffusers): add support to LTX-2 (#8019)
* feat(diffusers): add support to LTX-2

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Add to the gallery

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-14 09:07:30 +01:00
Ettore Di Giacinto
a6ff354c86
feat(tts): add pocket-tts backend (#8018)
* feat(pocket-tts): add new backend

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Add to the gallery

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fixups

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Update docs

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-13 23:35:19 +01:00
dependabot[bot]
94eecc43a3
chore(deps): bump protobuf from 6.33.2 to 6.33.4 in /backend/python/transformers (#7993)
chore(deps): bump protobuf in /backend/python/transformers

Bumps [protobuf](https://github.com/protocolbuffers/protobuf) from 6.33.2 to 6.33.4.
- [Release notes](https://github.com/protocolbuffers/protobuf/releases)
- [Commits](https://github.com/protocolbuffers/protobuf/commits)

---
updated-dependencies:
- dependency-name: protobuf
  dependency-version: 6.33.4
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-01-12 23:46:32 +00:00
Ettore Di Giacinto
2de30440fe
fix(l4t-12): use pip to install python deps (#7967)
* fix: install only torch/torchvision from jetson index

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix: use pip for l4t-12

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Revert "fix: install only torch/torchvision from jetson index"

This reverts commit 2d2b020078

* chatterbox needs wheel

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-11 00:21:32 +01:00
Ettore Di Giacinto
a4d224dd1b
Revert "chore(uv): add --index-strategy=unsafe-first-match to l4t" (#7936)
Revert "chore(uv): add --index-strategy=unsafe-first-match to l4t (#7934)"

This reverts commit f5dee90962.
2026-01-08 23:31:51 +01:00
Ettore Di Giacinto
f5dee90962
chore(uv): add --index-strategy=unsafe-first-match to l4t (#7934)
This is because the main index might not contain all the dependencies
for torch

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-08 22:48:03 +01:00
Ettore Di Giacinto
383312b50e
chore(l4t-12): do not use python 3.12 (wheels are only for 3.10) (#7928)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-08 19:00:07 +01:00
Ettore Di Giacinto
b964b3d53e
feat(backends): add moonshine backend for faster transcription (#7833)
* feat(backends): add moonshine backend for faster transcription

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Add backend to CI, update AGENTS.md from this exercise

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-01-07 21:44:35 +01:00
Copilot
fd53978a7b
feat: package GPU libraries inside backend containers for unified base image (#7891)
* Initial plan

* Add GPU library packaging for isolated backend environments

- Create scripts/build/package-gpu-libs.sh for packaging CUDA, ROCm, SYCL, and Vulkan libraries
- Update llama-cpp, whisper, stablediffusion-ggml package.sh to include GPU libraries
- Update Dockerfile.python to package GPU libraries into Python backends
- Update libbackend.sh to set LD_LIBRARY_PATH for GPU library loading

Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>

* Address code review feedback: fix variable consistency and quoting

Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>

* Fix code review issues: improve glob handling and remove redundant variable

Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>

* Simplify main Dockerfile and workflow to use unified base image

- Remove GPU-specific driver installation from Dockerfile (CUDA, ROCm, Vulkan, Intel)
- Simplify image.yml workflow to build single unified base image for linux/amd64 and linux/arm64
- GPU libraries are now packaged in individual backend containers

Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-01-07 15:48:51 +01:00
Richard Palethorpe
e6ba26c3e7
chore: Update to Ubuntu24.04 (cont #7423) (#7769)
* ci(workflows): bump GitHub Actions images to Ubuntu 24.04

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* ci(workflows): remove CUDA 11.x support from GitHub Actions (incompatible with ubuntu:24.04)

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* ci(workflows): bump GitHub Actions CUDA support to 12.9

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* build(docker): bump base image to ubuntu:24.04 and adjust Vulkan SDK/packages

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* fix(backend): correct context paths for Python backends in workflows, Makefile and Dockerfile

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* chore(make): disable parallel backend builds to avoid race conditions

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* chore(make): export CUDA_MAJOR_VERSION and CUDA_MINOR_VERSION for override

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* build(backend): update backend Dockerfiles to Ubuntu 24.04

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* chore(backend): add ROCm env vars and default AMDGPU_TARGETS for hipBLAS builds

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* chore(chatterbox): bump ROCm PyTorch to 2.9.1+rocm6.4 and update index URL; align hipblas requirements

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* chore: add local-ai-launcher to .gitignore

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* ci(workflows): fix backends GitHub Actions workflows after rebase

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* build(docker): use build-time UBUNTU_VERSION variable

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* chore(docker): remove libquadmath0 from requirements-stage base image

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* chore(make): add backends/vllm to .NOTPARALLEL to prevent parallel builds

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* fix(docker): correct CUDA installation steps in backend Dockerfiles

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* chore(backend): update ROCm to 6.4 and align Python hipblas requirements

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* ci(workflows): switch GitHub Actions runners to Ubuntu-24.04 for CUDA on arm64 builds

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* build(docker): update base image and backend Dockerfiles for Ubuntu 24.04 compatibility on arm64

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* build(backend): increase timeout for uv installs behind slow networks on backend/Dockerfile.python

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* ci(workflows): switch GitHub Actions runners to Ubuntu-24.04 for vibevoice backend

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* ci(workflows): fix failing GitHub Actions runners

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>

* fix: Allow FROM_SOURCE to be unset, use upstream Intel images etc.

Signed-off-by: Richard Palethorpe <io@richiejp.com>

* chore(build): rm all traces of CUDA 11

Signed-off-by: Richard Palethorpe <io@richiejp.com>

* chore(build): Add Ubuntu codename as an argument

Signed-off-by: Richard Palethorpe <io@richiejp.com>

---------

Signed-off-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>
Signed-off-by: Richard Palethorpe <io@richiejp.com>
Co-authored-by: Alessandro Sturniolo <alessandro.sturniolo@gmail.com>
2026-01-06 15:26:42 +01:00
blightbow
67baf66555
feat(mlx): add thread-safe LRU prompt cache and min_p/top_k sampling (#7556)
* feat(mlx): add thread-safe LRU prompt cache

Port mlx-lm's LRUPromptCache to fix race condition where concurrent
requests corrupt shared KV cache state. The previous implementation
used a single prompt_cache instance shared across all requests.

Changes:
- Add backend/python/common/mlx_cache.py with ThreadSafeLRUPromptCache
- Modify backend.py to use per-request cache isolation via fetch/insert
- Add prefix matching for cache reuse across similar prompts
- Add LRU eviction (default 10 entries, configurable)
- Add concurrency and cache unit tests

The cache uses a trie-based structure for efficient prefix matching,
allowing prompts that share common prefixes to reuse cached KV states.
Thread safety is provided via threading.Lock.

New configuration options:
- max_cache_entries: Maximum LRU cache entries (default: 10)
- max_kv_size: Maximum KV cache size per entry (default: None)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Signed-off-by: Blightbow <blightbow@users.noreply.github.com>

* feat(mlx): add min_p and top_k sampler support

Add MinP field to proto (field 52) following the precedent set by
other non-OpenAI sampling parameters like TopK, TailFreeSamplingZ,
TypicalP, and Mirostat.

Changes:
- backend.proto: Add float MinP field for min-p sampling
- backend.py: Extract and pass min_p and top_k to mlx_lm sampler
  (top_k was in proto but not being passed)
- test.py: Fix test_sampling_params to use valid proto fields and
  switch to MLX-compatible model (mlx-community/Llama-3.2-1B-Instruct)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Signed-off-by: Blightbow <blightbow@users.noreply.github.com>

* refactor(mlx): move mlx_cache.py from common to mlx backend

The ThreadSafeLRUPromptCache is only used by the mlx backend. After
evaluating mlx-vlm, it was determined that the cache cannot be shared
because mlx-vlm's generate/stream_generate functions don't support
the prompt_cache parameter that mlx_lm provides.

- Move mlx_cache.py from backend/python/common/ to backend/python/mlx/
- Remove sys.path manipulation from backend.py and test.py
- Fix test assertion to expect "MLX model loaded successfully"

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Signed-off-by: Blightbow <blightbow@users.noreply.github.com>

* test(mlx): add comprehensive cache tests and document upstream behavior

Added comprehensive unit tests (test_mlx_cache.py) covering all cache
operation modes:
- Exact match
- Shorter prefix match
- Longer prefix match with trimming
- No match scenarios
- LRU eviction and access order
- Reference counting and deep copy behavior
- Multi-model namespacing
- Thread safety with data integrity verification

Documents upstream mlx_lm/server.py behavior: single-token prefixes are
deliberately not matched (uses > 0, not >= 0) to allow longer cached
sequences to be preferred for trimming. This is acceptable because real
prompts with chat templates are always many tokens.

Removed weak unit tests from test.py that only verified "no exception
thrown" rather than correctness.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Signed-off-by: Blightbow <blightbow@users.noreply.github.com>

* chore(mlx): remove unused MinP proto field

The MinP field was added to PredictOptions but is not populated by the
Go frontend/API. The MLX backend uses getattr with a default value,
so it works without the proto field.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Signed-off-by: Blightbow <blightbow@users.noreply.github.com>

---------

Signed-off-by: Blightbow <blightbow@users.noreply.github.com>
Co-authored-by: Blightbow <blightbow@users.noreply.github.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-16 11:27:46 +01:00
dependabot[bot]
dbd25885c3
chore(deps): bump sentence-transformers from 5.1.0 to 5.2.0 in /backend/python/transformers (#7594)
chore(deps): bump sentence-transformers in /backend/python/transformers

Bumps [sentence-transformers](https://github.com/huggingface/sentence-transformers) from 5.1.0 to 5.2.0.
- [Release notes](https://github.com/huggingface/sentence-transformers/releases)
- [Commits](https://github.com/huggingface/sentence-transformers/compare/v5.1.0...v5.2.0)

---
updated-dependencies:
- dependency-name: sentence-transformers
  dependency-version: 5.2.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-12-16 09:12:57 +01:00
Ettore Di Giacinto
7790a24682
Revert "chore(deps): bump torch from 2.5.1+cxx11.abi to 2.7.1+cpu in /backend/python/diffusers in the pip group across 1 directory" (#7558)
Revert "chore(deps): bump torch from 2.5.1+cxx11.abi to 2.7.1+cpu in /backend…"

This reverts commit 1b4aa6f1be.
2025-12-13 17:04:46 +01:00
dependabot[bot]
1b4aa6f1be
chore(deps): bump torch from 2.5.1+cxx11.abi to 2.7.1+cpu in /backend/python/diffusers in the pip group across 1 directory (#7549)
chore(deps): bump torch

Bumps the pip group with 1 update in the /backend/python/diffusers directory: torch.


Updates `torch` from 2.5.1+cxx11.abi to 2.7.1+cpu

---
updated-dependencies:
- dependency-name: torch
  dependency-version: 2.7.1+cpu
  dependency-type: direct:production
  dependency-group: pip
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-12-13 13:12:18 +00:00
Ettore Di Giacinto
504d954aea
Add chardet to requirements-l4t13.txt
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-12-13 12:59:03 +01:00
Ettore Di Giacinto
6d2a535813
chore(l4t13): use pytorch index (#7546)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-12-13 10:04:57 +01:00
Ettore Di Giacinto
32dcb58e89
feat(vibevoice): add new backend (#7494)
* feat(vibevoice): add backend

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* chore: add workflow and backend index

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* chore(gallery): add vibevoice

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Use self-hosted for intel builds

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Pin python version for l4t

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-12-10 21:14:21 +01:00
dependabot[bot]
bbce461f57
chore(deps): bump protobuf from 6.33.1 to 6.33.2 in /backend/python/transformers (#7481)
chore(deps): bump protobuf in /backend/python/transformers

Bumps [protobuf](https://github.com/protocolbuffers/protobuf) from 6.33.1 to 6.33.2.
- [Release notes](https://github.com/protocolbuffers/protobuf/releases)
- [Commits](https://github.com/protocolbuffers/protobuf/commits)

---
updated-dependencies:
- dependency-name: protobuf
  dependency-version: 6.33.2
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-12-08 22:13:18 +01:00
Copilot
1abbedd732
feat(diffusers): implement dynamic pipeline loader to remove per-pipeline conditionals (#7365)
* Initial plan

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Add dynamic loader for diffusers pipelines and refactor backend.py

Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Fix pipeline discovery error handling and test mock issue

Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Address code review feedback: direct imports, better error handling, improved tests

Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Address remaining code review feedback: specific exceptions, registry access, test imports

Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Add defensive fallback for DiffusionPipeline registry access

Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Actually use dynamic pipeline loading for all pipelines in backend

Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Use dynamic loader consistently for all pipelines including AutoPipelineForText2Image

Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Move dynamic loader tests into test.py for CI compatibility

Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Extend dynamic loader to discover any diffusers class type, not just DiffusionPipeline

Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Add AutoPipeline classes to pipeline registry for default model loading

Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(python): set pyvenv python home

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* do pyenv update during start

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Minor changes

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-12-04 19:02:06 +01:00
Ettore Di Giacinto
cfd95745ed
feat: add cuda13 images (#7404)
* chore(ci): add cuda13 jobs

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Add to pipelines and to capabilities. Start to work on the gallery

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* gallery

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* capabilities: try to detect by looking at /usr/local

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* neutts

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* backends.yaml

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* add cuda13 l4t requirements.txt

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* add cuda13 requirements.txt

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Fixups

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Fixups

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Pin vllm

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* Not all backends are compatible

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* add vllm to requirements

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* vllm is not pre-compiled for cuda 13

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-12-02 14:24:35 +01:00
Ettore Di Giacinto
4b5977f535
chore: drop pinning of python 3.12 (#7389)
Update install.sh

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-11-28 11:02:56 +01:00
Ettore Di Giacinto
0d877b1e71
Revert "chore(l4t): Update extra index URL for requirements-l4t.txt" (#7388)
Revert "chore(l4t): Update extra index URL for requirements-l4t.txt (#7383)"

This reverts commit 0d781e6b7e.
2025-11-28 11:02:11 +01:00
Ettore Di Giacinto
e27f1370eb
chore(diffusers): Add PY_STANDALONE_TAG for l4t Python version (#7387)
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-11-28 09:34:05 +01:00
Ettore Di Giacinto
e01d821314
chore: Add Python 3.12 support for l4t build profile (#7384)
Set Python version to 3.12 for l4t build profile.

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-11-27 23:00:09 +01:00
Ettore Di Giacinto
0d781e6b7e
chore(l4t): Update extra index URL for requirements-l4t.txt (#7383)
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-11-27 22:02:06 +01:00
Ettore Di Giacinto
7ccc383a8b
chore(l4t/diffusers): bump nvidia l4t index for pytorch 2.9 (#7379)
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-11-27 17:42:01 +01:00
Ettore Di Giacinto
2f8a2b1297
chore(deps): update diffusers dependency to use GitHub repo for l4t (#7369)
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-11-27 16:02:48 +01:00
dependabot[bot]
7e01aa8faa
chore(deps): bump protobuf from 6.32.0 to 6.33.1 in /backend/python/transformers (#7340)
chore(deps): bump protobuf in /backend/python/transformers

Bumps [protobuf](https://github.com/protocolbuffers/protobuf) from 6.32.0 to 6.33.1.
- [Release notes](https://github.com/protocolbuffers/protobuf/releases)
- [Changelog](https://github.com/protocolbuffers/protobuf/blob/main/protobuf_release.bzl)
- [Commits](https://github.com/protocolbuffers/protobuf/commits)

---
updated-dependencies:
- dependency-name: protobuf
  dependency-version: 6.33.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-11-24 20:12:17 +00:00
Ettore Di Giacinto
3a232446e0
Revert "chore(chatterbox): bump l4t index to support more recent pytorch" (#7333)
Revert "chore(chatterbox): bump l4t index to support more recent pytorch (#7332)"

This reverts commit 55607a5aac.
2025-11-22 10:10:27 +01:00
Ettore Di Giacinto
55607a5aac
chore(chatterbox): bump l4t index to support more recent pytorch (#7332)
This should add support for devices like the DGX Spark

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-11-21 22:24:46 +01:00
Ettore Di Giacinto
ec492a4c56
fix(typo): environment variable name for max jobs
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-11-21 18:37:22 +01:00
Ettore Di Giacinto
2defe98df8
fix(vllm): Update flash-attn to specific wheel URL
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-11-21 18:06:46 +01:00
Ettore Di Giacinto
6261c87b1b
Add NVCC_THREADS and MAX_JOB environment variables
Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-11-21 16:14:13 +01:00
Ettore Di Giacinto
daf39e1efd
chore(vllm/ci): set maximum number of jobs
Also added comments to clarify CPU usage during build.

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-11-20 15:53:32 +01:00
Mikhail Khludnev
01cd58a739
fix(reranker): support omitting top_n (#7199)
* fix(reranker): support omitting top_n

Signed-off-by: Mikhail Khludnev <mkhl@apache.org>

* fix(reranker): support omitting top_n

Signed-off-by: Mikhail Khludnev <mkhl@apache.org>

* pass 0 explicitly 

Signed-off-by: Mikhail Khludnev <mkhludnev@users.noreply.github.com>

---------

Signed-off-by: Mikhail Khludnev <mkhl@apache.org>
Signed-off-by: Mikhail Khludnev <mkhludnev@users.noreply.github.com>
2025-11-09 18:40:32 +01:00
Ettore Di Giacinto
2f2f9beee7
fix(chatterbox): pin numpy (#7198)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2025-11-08 16:52:22 +01:00
Mikhail Khludnev
122e4c7094
fix(reranker): reproduce ignoring top_n (#7025)
* fix(reranker): reproduce ignoring top_n

Signed-off-by: Mikhail Khludnev <mkhl@apache.org>

* fix(reranker): ignoring top_n

Signed-off-by: Mikhail Khludnev <mkhl@apache.org>

---------

Signed-off-by: Mikhail Khludnev <mkhl@apache.org>
2025-11-06 10:03:05 +00:00
Lukas Schaefer
d95d4992fe
feat: return complete audio for kokoro (#6842)
Signed-off-by: Lukas Schaefer <lukas@lschaefer.xyz>
2025-10-28 08:49:18 +01:00
dependabot[bot]
63e6721c2f
chore(deps): bump grpcio from 1.75.1 to 1.76.0 in /backend/python/diffusers (#6839)
chore(deps): bump grpcio in /backend/python/diffusers

Bumps [grpcio](https://github.com/grpc/grpc) from 1.75.1 to 1.76.0.
- [Release notes](https://github.com/grpc/grpc/releases)
- [Changelog](https://github.com/grpc/grpc/blob/master/doc/grpc_release_schedule.md)
- [Commits](https://github.com/grpc/grpc/compare/v1.75.1...v1.76.0)

---
updated-dependencies:
- dependency-name: grpcio
  dependency-version: 1.76.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-27 21:45:42 +01:00
dependabot[bot]
be027b1ccd
chore(deps): bump grpcio from 1.75.1 to 1.76.0 in /backend/python/transformers (#6828)
chore(deps): bump grpcio in /backend/python/transformers

Bumps [grpcio](https://github.com/grpc/grpc) from 1.75.1 to 1.76.0.
- [Release notes](https://github.com/grpc/grpc/releases)
- [Changelog](https://github.com/grpc/grpc/blob/master/doc/grpc_release_schedule.md)
- [Commits](https://github.com/grpc/grpc/compare/v1.75.1...v1.76.0)

---
updated-dependencies:
- dependency-name: grpcio
  dependency-version: 1.76.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-27 21:32:31 +01:00
dependabot[bot]
3ecadeeb93
chore(deps): bump grpcio from 1.75.1 to 1.76.0 in /backend/python/exllama2 (#6836)
chore(deps): bump grpcio in /backend/python/exllama2

Bumps [grpcio](https://github.com/grpc/grpc) from 1.75.1 to 1.76.0.
- [Release notes](https://github.com/grpc/grpc/releases)
- [Changelog](https://github.com/grpc/grpc/blob/master/doc/grpc_release_schedule.md)
- [Commits](https://github.com/grpc/grpc/compare/v1.75.1...v1.76.0)

---
updated-dependencies:
- dependency-name: grpcio
  dependency-version: 1.76.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-27 21:32:13 +01:00
dependabot[bot]
4af3348f91
chore(deps): bump grpcio from 1.75.1 to 1.76.0 in /backend/python/vllm (#6827)
Bumps [grpcio](https://github.com/grpc/grpc) from 1.75.1 to 1.76.0.
- [Release notes](https://github.com/grpc/grpc/releases)
- [Changelog](https://github.com/grpc/grpc/blob/master/doc/grpc_release_schedule.md)
- [Commits](https://github.com/grpc/grpc/compare/v1.75.1...v1.76.0)

---
updated-dependencies:
- dependency-name: grpcio
  dependency-version: 1.76.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-27 21:31:47 +01:00