LocalAI

mirror of https://github.com/mudler/LocalAI synced 2026-05-24 09:28:23 +00:00

History

LocalAI [bot] 6e1dbae256 feat(llama-cpp): expose 12 missing common_params via options[] (#9814 ) The llama.cpp backend already accepts a free-form options: array in the model config that maps to common_params fields, but a coverage audit against upstream pin 7f3f843c flagged 12 user-visible knobs that were neither set via the typed proto fields nor reachable via options:. Wire them up under the existing if/else chain in params_parse, before the speculative section. Each new option follows the file's prevailing patterns (try/catch around numeric parses, the same true/1/yes/on bool form used elsewhere, hardware_concurrency() fallback for thread counts, mirror of draft_override_tensor for override_tensor). Top-level / batching / IO: - n_ubatch (alias ubatch) -- physical batch size; was previously force-aliased to n_batch at line 482, blocking embedding/rerank workloads that need independent control - threads_batch (alias n_threads_batch) -- main-model batch threads; mirrors the existing draft_threads_batch - direct_io (alias use_direct_io) -- O_DIRECT model loads - verbosity -- llama.cpp log threshold (line 479 had this commented out) - override_tensor (alias tensor_buft_overrides) -- per-tensor buffer overrides for the main model; mirrors draft_override_tensor Embedding / multimodal: - pooling_type (alias pooling) -- mean/cls/last/rank/none; previously only auto-flipped to RANK for rerankers - embd_normalize (alias embedding_normalize) -- and the embedding handler now reads params_base.embd_normalize instead of a hardcoded 2 at the previous embd_normalize literal in Embedding() - mmproj_use_gpu (alias mmproj_offload) -- mmproj on CPU vs GPU - image_min_tokens / image_max_tokens -- per-image vision token budget Reasoning surface (the audit-focus three; LocalAI's existing ReasoningConfig.DisableReasoning only feeds the per-request chat_template_kwargs.enable_thinking and does not touch any of these): - reasoning_format -- none/auto/deepseek/deepseek-legacy parser - enable_reasoning (alias reasoning_budget) -- -1/0/>0 thinking budget - prefill_assistant -- trailing-assistant-message prefill toggle All 14 referenced fields exist on both the upstream pin and the turboquant fork's common.h, so no LOCALAI_LEGACY_LLAMA_CPP_SPEC guard is needed. Docs: extend model-configuration.md with new "Reasoning Models", "Multimodal Backend Options", "Embedding & Reranking Backend Options", and "Other Backend Tuning Options" subsections; also refresh the Speculative Type Values table to show the new dash-separated canonical names alongside the underscore aliases LocalAI still accepts. Assisted-by: claude-code:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>		2026-05-14 08:53:34 +02:00
..
advanced	feat(llama-cpp): expose 12 missing common_params via options[] (#9814 )	2026-05-14 08:53:34 +02:00
features	feat(sglang): wire engine_args, add cuda13 build, ship MTP gallery demos (#9686 )	2026-05-07 17:27:29 +02:00
getting-started	feat(rocm): bump to 7.x (#9323 )	2026-04-12 08:51:30 +02:00
installation	feat(ui): Interactive model config editor with autocomplete (#9149 )	2026-04-07 14:42:23 +02:00
reference	fix(distributed): correct VRAM/RAM reporting on NVIDIA unified-memory hosts (#9545 )	2026-04-24 22:02:23 +02:00
_index.md	chore(docs): center video	2025-12-08 16:59:11 +01:00
faq.md	feat: docs revamp (#7313 )	2025-11-19 22:21:20 +01:00
integrations.md	fix(anthropic): do not emit empty tokens and fix SSE tool calls (#9258 )	2026-04-07 00:38:21 +02:00
overview.md	docs: credit the LocalAI maintainers team	2026-05-02 23:37:04 +00:00
whats-new.md	feat(api): add /v1/audio/diarization endpoint with sherpa-onnx + vibevoice.cpp (#9654 )	2026-05-05 15:10:13 +02:00