LocalAI

mirror of https://github.com/mudler/LocalAI synced 2026-05-24 09:28:23 +00:00

History

Ettore Di Giacinto cd6079b2f3 feat(backend): add buun-llama-cpp fork (DFlash + TCQ KV-cache) spiritbuun/buun-llama-cpp is a fork of TheTom/llama-cpp-turboquant that adds two independent features on top: DFlash block-diffusion speculative decoding (via a dedicated DFlashDraftModel GGUF arch) and two extra TCQ KV-cache variants (turbo2_tcq, turbo3_tcq) on top of TurboQuant's turbo2/turbo3/turbo4. Follows the turboquant thin-wrapper pattern — reuses backend/cpp/llama-cpp grpc-server sources verbatim, patches only the build copy to extend the KV allow-list and wire up buun-exclusive tree_budget / draft_topk options. DraftModel is already wired end-to-end (proto field 39 → params.speculative), so DFlash activation only needs the existing options passthrough (spec_type:dflash) plus the drafter path in draft_model. CacheTypeOptions now surfaces the five turbo* values so the React UI dropdown shows them — benefits turboquant too (previously users had to type them in YAML manually). Assisted-by: Claude:Opus-4.7 [Read] [Edit] [Bash] [WebFetch] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>		2026-04-24 12:52:53 +00:00
..
gen_inference_defaults	feat: inferencing default, automatic tool parsing fallback and wire min_p (#9092 )	2026-03-22 00:57:15 +01:00
meta	feat(backend): add buun-llama-cpp fork (DFlash + TCQ KV-cache)	2026-04-24 12:52:53 +00:00
application_config.go	feat(ux): backend management enhancement (#9325 )	2026-04-12 00:35:22 +02:00
application_config_test.go	feat: backend versioning, upgrade detection and auto-upgrade (#9315 )	2026-04-11 22:31:15 +02:00
backend_hooks.go	feat(vllm): parity with llama.cpp backend (#9328 )	2026-04-13 11:00:29 +02:00
config_suite_test.go	dependencies(grpcio): bump to fix CI issues (#2362 )	2024-05-21 14:33:47 +02:00
distributed_config.go	feat: add distributed mode (#9124 )	2026-03-30 00:47:27 +02:00
gallery.go	refactor: gallery inconsistencies (#2647 )	2024-06-24 17:32:12 +02:00
gguf.go	Respect explicit reasoning config during GGUF thinking probe (#9463 )	2026-04-21 21:53:10 +02:00
gguf_reasoning_test.go	Respect explicit reasoning config during GGUF thinking probe (#9463 )	2026-04-21 21:53:10 +02:00
hooks_llamacpp.go	feat(vllm): parity with llama.cpp backend (#9328 )	2026-04-13 11:00:29 +02:00
hooks_test.go	feat(vllm): parity with llama.cpp backend (#9328 )	2026-04-13 11:00:29 +02:00
hooks_vllm.go	feat(vllm): parity with llama.cpp backend (#9328 )	2026-04-13 11:00:29 +02:00
inference_defaults.go	feat: inferencing default, automatic tool parsing fallback and wire min_p (#9092 )	2026-03-22 00:57:15 +01:00
inference_defaults.json	chore: bump inference defaults from unsloth (#9396 )	2026-04-17 09:05:55 +02:00
inference_defaults_test.go	feat: inferencing default, automatic tool parsing fallback and wire min_p (#9092 )	2026-03-22 00:57:15 +01:00
model_config.go	feat: Add Sherpa ONNX backend for ASR and TTS (#8523 )	2026-04-24 14:40:06 +02:00
model_config_filter.go	feat: add distributed mode (#9124 )	2026-03-30 00:47:27 +02:00
model_config_loader.go	feat: improve CLI error messages with actionable guidance (#8880 )	2026-04-21 11:53:26 +02:00
model_config_test.go	fix(realtime): Use user provided voice and allow pipeline models to have no backend (#8415 )	2026-02-11 14:18:05 +01:00
model_test.go	fix(config): ignore yaml backup files in model loader (#9443 )	2026-04-20 23:41:39 +02:00
parser_defaults.json	feat(vllm): parity with llama.cpp backend (#9328 )	2026-04-13 11:00:29 +02:00
runtime_settings.go	feat(ux): backend management enhancement (#9325 )	2026-04-12 00:35:22 +02:00