mirror of
https://github.com/mudler/LocalAI
synced 2026-05-24 09:28:23 +00:00
When TASK_RESPONSE_TYPE_OAI_CHAT is used, the first streaming token produces a JSON array with two elements: a role-init chunk and the actual content chunk. The grpc-server loop called attach_chat_deltas for both elements with the same raw_result pointer, stamping the first token's ChatDelta.Content on both replies. The Go side accumulated both, emitting the first content token twice to SSE clients. Fix: in the array iteration loops in PredictStream, detect role-init elements (delta has "role" key) and skip attach_chat_deltas for them. Only content/reasoning elements get chat deltas attached. Reasoning models are unaffected because their first token goes into reasoning_content, not content. |
||
|---|---|---|
| .. | ||
| auth | ||
| endpoints | ||
| middleware | ||
| react-ui | ||
| routes | ||
| static | ||
| views | ||
| app.go | ||
| app_test.go | ||
| explorer.go | ||
| http_suite_test.go | ||
| openresponses_test.go | ||
| render.go | ||