mirror of
https://github.com/mudler/LocalAI
synced 2026-04-21 13:27:21 +00:00
The C++ PEG parser needs a few tokens to identify the reasoning format (e.g. "<|channel>thought\n" for Gemma 4). During this warm-up, the gRPC layer was sending raw partial tag tokens to Go, which leaked into the reasoning field. - Clear reply.message in gRPC when autoparser is active but has no diffs yet, matching llama.cpp server behavior of only emitting classified output - Prefer C++ autoparser chat deltas for reasoning/content in all streaming paths, falling back to Go-side extraction for backends without autoparser (e.g. vLLM) - Override non-streaming no-tools result with chat delta content when available - Guard PrependThinkingTokenIfNeeded against partial tag prefixes during streaming accumulation - Reorder default thinking tokens so <|channel>thought is checked before <|think|> (Gemma 4 templates contain both) |
||
|---|---|---|
| .. | ||
| application | ||
| backend | ||
| cli | ||
| clients | ||
| config | ||
| dependencies_manager | ||
| explorer | ||
| gallery | ||
| http | ||
| p2p | ||
| schema | ||
| services | ||
| startup | ||
| templates | ||
| trace | ||