LocalAI/docs/content/reference
Ettore Di Giacinto cd6079b2f3 feat(backend): add buun-llama-cpp fork (DFlash + TCQ KV-cache)
spiritbuun/buun-llama-cpp is a fork of TheTom/llama-cpp-turboquant that adds
two independent features on top: DFlash block-diffusion speculative decoding
(via a dedicated DFlashDraftModel GGUF arch) and two extra TCQ KV-cache
variants (turbo2_tcq, turbo3_tcq) on top of TurboQuant's turbo2/turbo3/turbo4.

Follows the turboquant thin-wrapper pattern — reuses backend/cpp/llama-cpp
grpc-server sources verbatim, patches only the build copy to extend the KV
allow-list and wire up buun-exclusive tree_budget / draft_topk options.
DraftModel is already wired end-to-end (proto field 39 → params.speculative),
so DFlash activation only needs the existing options passthrough
(spec_type:dflash) plus the drafter path in draft_model.

CacheTypeOptions now surfaces the five turbo* values so the React UI dropdown
shows them — benefits turboquant too (previously users had to type them in
YAML manually).

Assisted-by: Claude:Opus-4.7 [Read] [Edit] [Bash] [WebFetch]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-04-24 12:52:53 +00:00
..
_index.en.md feat: docs revamp (#7313) 2025-11-19 22:21:20 +01:00
_index.md docs(agents): adopt kernel's AI coding assistants policy 2026-04-19 22:50:54 +00:00
ai-coding-assistants.md docs(agents): adopt kernel's AI coding assistants policy 2026-04-19 22:50:54 +00:00
api-errors.md feat: add users and authentication support (#9061) 2026-03-19 21:40:51 +01:00
architecture.md feat: docs revamp (#7313) 2025-11-19 22:21:20 +01:00
binaries.md chore: drop bark which is unmaintained (#8207) 2026-01-25 09:26:40 +01:00
cli-reference.md feat: add node reconciler, allow to schedule to group of nodes, min/max autoscaler (#9186) 2026-03-31 08:28:56 +02:00
compatibility-table.md feat(backend): add buun-llama-cpp fork (DFlash + TCQ KV-cache) 2026-04-24 12:52:53 +00:00
nvidia-l4t.md chore(docs): update docs with cuda 13 instructions and the new vibevoice backend 2025-12-25 10:00:07 +01:00
shell-completion.md Update shell completion documentation URL 2026-03-08 09:33:36 +01:00
system-info.md feat: Add documentation for undocumented API endpoints (#8852) 2026-03-08 17:59:33 +01:00