LocalAI/core/backend
LocalAI [bot] c500461c69
feat(config): default prompt_cache_all to true (#9951)
Upstream llama.cpp defaults `cache_prompt = true` (common/common.h),
but `parse_options` in the grpc-server backend unconditionally forwards
the proto `PromptCacheAll` field, so any model that didn't set
`prompt_cache_all: true` in its YAML was getting `cache_prompt=false` —
silently overriding llama.cpp's own default. With `kv_unified` and
`cache_idle_slots` already on by default, this was the last piece
preventing the per-request prompt cache from being usable out of the
box.

Make `PromptCacheAll` tristate (`*bool`), default it to `true` in
`SetDefaults`, and dereference at the proto boundary. Users can still
opt out with an explicit `prompt_cache_all: false`. Same pattern as
`MMap`, `MMlock`, `Reranking`, etc.

Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 22:06:22 +02:00
..
audio_transform.go feat(whisper): honor client cancellation via ggml abort_callback (#9710) 2026-05-08 01:44:47 +02:00
backend_suite_test.go feat: extract output with regexes from LLMs (#3491) 2024-09-13 13:27:36 +02:00
detection.go feat(whisper): honor client cancellation via ggml abort_callback (#9710) 2026-05-08 01:44:47 +02:00
diarization.go feat(whisper): honor client cancellation via ggml abort_callback (#9710) 2026-05-08 01:44:47 +02:00
diarization_test.go feat(api): add /v1/audio/diarization endpoint with sherpa-onnx + vibevoice.cpp (#9654) 2026-05-05 15:10:13 +02:00
embeddings.go feat(ui): Per model backend logs and various fixes (#9028) 2026-03-18 08:31:26 +01:00
face_analyze.go feat(whisper): honor client cancellation via ggml abort_callback (#9710) 2026-05-08 01:44:47 +02:00
face_embed.go feat(whisper): honor client cancellation via ggml abort_callback (#9710) 2026-05-08 01:44:47 +02:00
face_verify.go feat(whisper): honor client cancellation via ggml abort_callback (#9710) 2026-05-08 01:44:47 +02:00
image.go feat(ui): Per model backend logs and various fixes (#9028) 2026-03-18 08:31:26 +01:00
llm.go feat(gallery): verify backend OCI images with keyless cosign (#9823) 2026-05-18 08:02:20 +02:00
llm_probe_test.go Respect explicit reasoning config during GGUF thinking probe (#9463) 2026-04-21 21:53:10 +02:00
llm_test.go feat(autoparser): prefer chat deltas from backends when emitted (#9224) 2026-04-04 12:12:08 +02:00
options.go feat(config): default prompt_cache_all to true (#9951) 2026-05-22 22:06:22 +02:00
options_internal_test.go feat(vllm): expose AsyncEngineArgs via generic engine_args YAML map (#9563) 2026-04-29 00:49:28 +02:00
rerank.go feat(whisper): honor client cancellation via ggml abort_callback (#9710) 2026-05-08 01:44:47 +02:00
soundgeneration.go feat(whisper): honor client cancellation via ggml abort_callback (#9710) 2026-05-08 01:44:47 +02:00
stores.go feat: add biometrics UI (#9524) 2026-04-24 08:50:34 +02:00
token_metrics.go feat(whisper): honor client cancellation via ggml abort_callback (#9710) 2026-05-08 01:44:47 +02:00
tokenize.go feat: add distributed mode (#9124) 2026-03-30 00:47:27 +02:00
transcript.go feat(whisper): honor client cancellation via ggml abort_callback (#9710) 2026-05-08 01:44:47 +02:00
tts.go feat(whisper): honor client cancellation via ggml abort_callback (#9710) 2026-05-08 01:44:47 +02:00
vad.go feat: add distributed mode (#9124) 2026-03-30 00:47:27 +02:00
video.go feat(ui): Per model backend logs and various fixes (#9028) 2026-03-18 08:31:26 +01:00
voice_analyze.go feat(whisper): honor client cancellation via ggml abort_callback (#9710) 2026-05-08 01:44:47 +02:00
voice_embed.go feat(whisper): honor client cancellation via ggml abort_callback (#9710) 2026-05-08 01:44:47 +02:00
voice_verify.go feat(whisper): honor client cancellation via ggml abort_callback (#9710) 2026-05-08 01:44:47 +02:00