LocalAI

mirror of https://github.com/mudler/LocalAI synced 2026-05-24 09:28:23 +00:00

History

Ettore Di Giacinto 53deeb1107 fix(reasoning): suppress partial tag tokens during autoparser warm-up The C++ PEG parser needs a few tokens to identify the reasoning format (e.g. "<\|channel>thought\n" for Gemma 4). During this warm-up, the gRPC layer was sending raw partial tag tokens to Go, which leaked into the reasoning field. - Clear reply.message in gRPC when autoparser is active but has no diffs yet, matching llama.cpp server behavior of only emitting classified output - Prefer C++ autoparser chat deltas for reasoning/content in all streaming paths, falling back to Go-side extraction for backends without autoparser (e.g. vLLM) - Override non-streaming no-tools result with chat delta content when available - Guard PrependThinkingTokenIfNeeded against partial tag prefixes during streaming accumulation - Reorder default thinking tokens so <\|channel>thought is checked before <\|think\|> (Gemma 4 templates contain both)		2026-04-04 20:45:57 +00:00
..
CMakeLists.txt	fix: BMI2 crash on AVX-only CPUs (Intel Ivy Bridge/Sandy Bridge) (#7864 )	2026-01-06 00:13:48 +00:00
grpc-server.cpp	fix(reasoning): suppress partial tag tokens during autoparser warm-up	2026-04-04 20:45:57 +00:00
Makefile	fix(reasoning): warm-up	2026-04-04 20:25:24 +00:00
package.sh	fix(llama.cpp): bundle libdl, librt, libpthread in llama-cpp backend (#9099 )	2026-03-22 00:58:14 +01:00
prepare.sh	chore: ⬆️ Update ggml-org/llama.cpp to `7f8ef50cce40e3e7e4526a3696cb45658190e69a` (#7402 )	2025-12-01 07:50:40 +01:00
run.sh	fix(llama-cpp/darwin): make sure to bundle `libutf8` libs (#6060 )	2025-08-14 17:56:35 +02:00