mirror of
https://github.com/mudler/LocalAI
synced 2026-05-24 09:28:23 +00:00
* fix(docs): correct broken Hugo relrefs
The Hugo build has been failing on master since the relevant pages
landed:
- text-generation.md:720 referenced `/docs/features/distributed-mode`,
but Hugo `relref` paths are relative to the content root, not the
rendered URL. Drop the `/docs/` prefix so the lookup matches the
existing `features/...` form used elsewhere in the file.
- audio-transform.md:144 referenced `tts.md`; the actual page is
`text-to-audio.md`.
Assisted-by: Claude:claude-opus-4-7[1m]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix(kokoros): stub Diarize and AudioTransform Backend trait methods
The recent backend.proto additions (Diarize, AudioTransform,
AudioTransformStream) extended the gRPC Backend trait, breaking
kokoros-grpc compilation with E0046 because the Rust implementation
hadn't picked up the new methods. Add Unimplemented stubs matching the
existing pattern for non-applicable RPCs in this TTS-only backend.
Assisted-by: Claude:claude-opus-4-7[1m]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix(vibevoice-cpp): track upstream ABI + wire 1.5B voice cloning
Two recent commits in mudler/vibevoice.cpp reshaped the vv_capi_tts
signature without a corresponding bump on the LocalAI side:
3bd759c "1.5b: unify into a single tts entry point" inserted a
ref_audio_path parameter between voice_path and dst_wav_path.
ad856bd "1.5b: multi-speaker dialog support" promoted that to a
(const char* const* ref_audio_paths, int n_ref_audio_paths)
pair for per-speaker conditioning.
Because purego resolves symbols by name and not by signature, the
build kept linking; at runtime the misaligned arguments turned the
TTS->ASR closed-loop test into a SIGSEGV inside cgo. Track HEAD
explicitly and bring the bridge in line with it:
* Update the CppTTS purego binding to the 9-arg form. purego
marshals []*byte as a **char by handing the C side the underlying
array address; nil/empty maps to NULL, which matches the C
contract for "no reference audio" on the realtime-0.5B path.
* Add a `ref_audio` gallery option (comma-separated, repeatable)
that the 1.5B path consumes for runtime voice cloning. Multiple
entries are interpreted as one WAV per speaker (Speaker 0..n-1).
* TTSRequest.Voice now routes by extension/shape: `.wav` or a
comma-separated list goes to ref_audio_paths; anything else stays
on voice_path (realtime-0.5B's pre-baked voice gguf).
* Pin VIBEVOICE_CPP_VERSION to ad856bd and wire the Makefile into
the existing bump_deps matrix so future upstream rolls land as
reviewable PRs instead of a silent CI break.
Assisted-by: Claude:claude-opus-4-7[1m]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* refactor(vibevoice-cpp): use ModelOptions.AudioPath for 1.5B ref audio
Use the existing audio_path field from ModelOptions (already plumbed
through config_file's `audio_path:` YAML and consumed by other audio
backends like kokoros) instead of inventing a custom `ref_audio:`
Options[] string. Multi-speaker setups stay on a single comma-
separated value.
No behavior change beyond the gallery key name; per-call routing via
TTSRequest.Voice is unchanged.
Assisted-by: Claude:claude-opus-4-7[1m]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
|
||
|---|---|---|
| .. | ||
| _index.en.md | ||
| agents.md | ||
| api-discovery.md | ||
| audio-diarization.md | ||
| audio-to-text.md | ||
| audio-transform.md | ||
| authentication.md | ||
| backend-monitor.md | ||
| backends.md | ||
| constrained_grammars.md | ||
| distributed-mode.md | ||
| distributed_inferencing.md | ||
| distribution.md | ||
| embeddings.md | ||
| face-recognition.md | ||
| fine-tuning.md | ||
| gpt-vision.md | ||
| GPU-acceleration.md | ||
| image-generation.md | ||
| localai-assistant.md | ||
| mcp.md | ||
| mlx-distributed.md | ||
| model-gallery.md | ||
| object-detection.md | ||
| openai-functions.md | ||
| openai-realtime.md | ||
| p2p.md | ||
| quantization.md | ||
| reranker.md | ||
| runtime-settings.md | ||
| sound-generation.md | ||
| stores.md | ||
| text-generation.md | ||
| text-to-audio.md | ||
| video-generation.md | ||
| voice-activity-detection.md | ||
| voice-recognition.md | ||