mirror of
https://github.com/mudler/LocalAI
synced 2026-04-21 21:37:21 +00:00
The C++ PEG parser needs a few tokens to identify the reasoning format (e.g. "<|channel>thought\n" for Gemma 4). During this warm-up, the gRPC layer was sending raw partial tag tokens to Go, which leaked into the reasoning field. - Clear reply.message in gRPC when autoparser is active but has no diffs yet, matching llama.cpp server behavior of only emitting classified output - Prefer C++ autoparser chat deltas for reasoning/content in all streaming paths, falling back to Go-side extraction for backends without autoparser (e.g. vLLM) - Override non-streaming no-tools result with chat delta content when available - Guard PrependThinkingTokenIfNeeded against partial tag prefixes during streaming accumulation - Reorder default thinking tokens so <|channel>thought is checked before <|think|> (Gemma 4 templates contain both) |
||
|---|---|---|
| .. | ||
| auth | ||
| endpoints | ||
| middleware | ||
| react-ui | ||
| routes | ||
| static | ||
| views | ||
| app.go | ||
| app_test.go | ||
| explorer.go | ||
| http_suite_test.go | ||
| openresponses_test.go | ||
| render.go | ||