LocalAI/pkg
Ettore Di Giacinto 53deeb1107 fix(reasoning): suppress partial tag tokens during autoparser warm-up
The C++ PEG parser needs a few tokens to identify the reasoning format
(e.g. "<|channel>thought\n" for Gemma 4). During this warm-up, the gRPC
layer was sending raw partial tag tokens to Go, which leaked into the
reasoning field.

- Clear reply.message in gRPC when autoparser is active but has no diffs
  yet, matching llama.cpp server behavior of only emitting classified output
- Prefer C++ autoparser chat deltas for reasoning/content in all streaming
  paths, falling back to Go-side extraction for backends without autoparser
  (e.g. vLLM)
- Override non-streaming no-tools result with chat delta content when available
- Guard PrependThinkingTokenIfNeeded against partial tag prefixes during
  streaming accumulation
- Reorder default thinking tokens so <|channel>thought is checked before
  <|think|> (Gemma 4 templates contain both)
2026-04-04 20:45:57 +00:00
..
audio feat: add distributed mode (#9124) 2026-03-30 00:47:27 +02:00
concurrency feat: add distributed mode (#9124) 2026-03-30 00:47:27 +02:00
downloader feat(distributed): Avoid resending models to backend nodes (#9193) 2026-03-31 16:28:13 +02:00
functions feat: add distributed mode (#9124) 2026-03-30 00:47:27 +02:00
grpc chore(refactor): use interface (#9226) 2026-04-04 17:29:37 +02:00
huggingface-api feat: add distributed mode (#9124) 2026-03-30 00:47:27 +02:00
model chore(refactor): use interface (#9226) 2026-04-04 17:29:37 +02:00
oci feat(ui): allow to cancel ops (#7264) 2025-11-13 18:41:47 +01:00
reasoning fix(reasoning): suppress partial tag tokens during autoparser warm-up 2026-04-04 20:45:57 +00:00
sanitize feat: add distributed mode (#9124) 2026-03-30 00:47:27 +02:00
signals feat: add distributed mode (#9124) 2026-03-30 00:47:27 +02:00
sound feat: add distributed mode (#9124) 2026-03-30 00:47:27 +02:00
store chore: fix go.mod module (#2635) 2024-06-23 08:24:36 +00:00
system fix: gate CUDA directory checks on GPU vendor to prevent false CUDA detection (#8942) 2026-03-12 07:53:39 +01:00
utils feat: add distributed mode (#9124) 2026-03-30 00:47:27 +02:00
vram feat(api): Allow coding agents to interactively discover how to control and configure LocalAI (#9084) 2026-04-04 15:14:35 +02:00
xio feat(ui): allow to cancel ops (#7264) 2025-11-13 18:41:47 +01:00
xsync chore: fix go.mod module (#2635) 2024-06-23 08:24:36 +00:00
xsysinfo feat(gpu): add jetson/tegra detection 2026-03-31 15:45:07 +00:00