LocalAI/core/backend
Dream 10a1e6c74d
feat(whisperx): add whisperx backend for transcription with speaker diarization (#8299)
* feat(proto): add speaker field to TranscriptSegment for diarization

Add speaker field to the gRPC TranscriptSegment message and map it
through the Go schema, enabling backends to return speaker labels.

Signed-off-by: eureka928 <meobius123@gmail.com>

* feat(whisperx): add whisperx backend for transcription with diarization

Add Python gRPC backend using WhisperX for speech-to-text with
word-level timestamps, forced alignment, and speaker diarization
via pyannote-audio when HF_TOKEN is provided.

Signed-off-by: eureka928 <meobius123@gmail.com>

* feat(whisperx): register whisperx backend in Makefile

Signed-off-by: eureka928 <meobius123@gmail.com>

* feat(whisperx): add whisperx meta and image entries to index.yaml

Signed-off-by: eureka928 <meobius123@gmail.com>

* ci(whisperx): add build matrix entries for CPU, CUDA 12/13, and ROCm

Signed-off-by: eureka928 <meobius123@gmail.com>

* fix(whisperx): unpin torch versions and use CPU index for cpu requirements

Address review feedback:
- Use --extra-index-url for CPU torch wheels to reduce size
- Remove torch version pins, let uv resolve compatible versions

Signed-off-by: eureka928 <meobius123@gmail.com>

* fix(whisperx): pin torch ROCm variant to fix CI build failure

Signed-off-by: eureka928 <meobius123@gmail.com>

* fix(whisperx): pin torch CPU variant to fix uv resolution failure

Pin torch==2.8.0+cpu so uv resolves the CPU wheel from the extra
index instead of picking torch==2.8.0+cu128 from PyPI, which pulls
unresolvable CUDA dependencies.

Signed-off-by: eureka928 <meobius123@gmail.com>

* fix(whisperx): use unsafe-best-match index strategy to fix uv resolution failure

uv's default first-match strategy finds torch on PyPI before checking
the extra index, causing it to pick torch==2.8.0+cu128 instead of the
CPU variant. This makes whisperx's transitive torch dependency
unresolvable. Using unsafe-best-match lets uv consider all indexes.

Signed-off-by: eureka928 <meobius123@gmail.com>

* fix(whisperx): drop +cpu local version suffix to fix uv resolution failure

PEP 440 ==2.8.0 matches 2.8.0+cpu from the extra index, avoiding the
issue where uv cannot locate an explicit +cpu local version specifier.
This aligns with the pattern used by all other CPU backends.

Signed-off-by: eureka928 <meobius123@gmail.com>

* fix(backends): drop +rocm local version suffixes from hipblas requirements to fix uv resolution

uv cannot resolve PEP 440 local version specifiers (e.g. +rocm6.4,
+rocm6.3) in pinned requirements. The --extra-index-url already points
to the correct ROCm wheel index and --index-strategy unsafe-best-match
(set in libbackend.sh) ensures the ROCm variant is preferred.

Applies the same fix as 7f5d72e8 (which resolved this for +cpu) across
all 14 hipblas requirements files.

Signed-off-by: eureka928 <meobius123@gmail.com>

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Signed-off-by: eureka928 <meobius123@gmail.com>

* revert: scope hipblas suffix fix to whisperx only

Reverts changes to non-whisperx hipblas requirements files per
maintainer review — other backends are building fine with the +rocm
local version suffix.

Signed-off-by: eureka928 <meobius123@gmail.com>

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Signed-off-by: eureka928 <meobius123@gmail.com>

---------

Signed-off-by: eureka928 <meobius123@gmail.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-02 16:33:12 +01:00
..
backend_suite_test.go feat: extract output with regexes from LLMs (#3491) 2024-09-13 13:27:36 +02:00
detection.go feat(loader): enhance single active backend to support LRU eviction (#7535) 2025-12-12 12:28:38 +01:00
embeddings.go feat(loader): enhance single active backend to support LRU eviction (#7535) 2025-12-12 12:28:38 +01:00
image.go feat(UI): image generation improvements (#7804) 2025-12-31 21:59:46 +01:00
llm.go feat: detect thinking support from backend automatically if not explicitly set (#8167) 2026-01-23 00:38:28 +01:00
llm_test.go feat(backends): add system backend, refactor (#6059) 2025-08-14 19:38:26 +02:00
options.go fix(llama.cpp/mmproj): fix loading mmproj in nested sub-dirs different from model path (#7832) 2026-01-02 20:17:30 +01:00
rerank.go feat(loader): enhance single active backend to support LRU eviction (#7535) 2025-12-12 12:28:38 +01:00
soundgeneration.go feat(loader): enhance single active backend to support LRU eviction (#7535) 2025-12-12 12:28:38 +01:00
stores.go feat: refactor build process, drop embedded backends (#5875) 2025-07-22 16:31:04 +02:00
token_metrics.go feat(loader): enhance single active backend to support LRU eviction (#7535) 2025-12-12 12:28:38 +01:00
tokenize.go feat(loader): enhance single active backend to support LRU eviction (#7535) 2025-12-12 12:28:38 +01:00
transcript.go feat(whisperx): add whisperx backend for transcription with speaker diarization (#8299) 2026-02-02 16:33:12 +01:00
tts.go feat(tts): add support for streaming mode (#8291) 2026-01-30 11:58:01 +01:00
vad.go feat(loader): enhance single active backend to support LRU eviction (#7535) 2025-12-12 12:28:38 +01:00
video.go feat(loader): enhance single active backend to support LRU eviction (#7535) 2025-12-12 12:28:38 +01:00