mirror of
https://github.com/mudler/LocalAI
synced 2026-05-24 09:28:23 +00:00
The llama.cpp C++-side chat autoparser clears Reply.Message and delivers parsed content/reasoning/tool-calls via Reply.chat_deltas. chat.go handles this (non-SSE path uses ToolCallsFromChatDeltas/ContentFromChatDeltas/ ReasoningFromChatDeltas), but realtime.go only read pred.Response, so any model routed through the autoparser (Qwen2.5/3 and friends) produced a silent reply: backend emitted N tokens, the session surface saw zero. Mirror the non-SSE chat path in realtime's triggerResponse: when deltas carry tool calls or content, use them directly; otherwise fall back to the existing raw-text parsing. Assisted-by: claude-opus-4-7-1M [Claude Code] Signed-off-by: Richard Palethorpe <io@richiejp.com> |
||
|---|---|---|
| .. | ||
| auth | ||
| endpoints | ||
| middleware | ||
| react-ui | ||
| routes | ||
| static | ||
| views | ||
| app.go | ||
| app_test.go | ||
| explorer.go | ||
| http_suite_test.go | ||
| openresponses_test.go | ||
| render.go | ||