LocalAI

mirror of https://github.com/mudler/LocalAI synced 2026-05-24 09:28:23 +00:00

History

Ettore Di Giacinto 9748a1cbc6 fix(streaming): skip chat deltas for role-init elements to prevent first token duplication (#9299 ) When TASK_RESPONSE_TYPE_OAI_CHAT is used, the first streaming token produces a JSON array with two elements: a role-init chunk and the actual content chunk. The grpc-server loop called attach_chat_deltas for both elements with the same raw_result pointer, stamping the first token's ChatDelta.Content on both replies. The Go side accumulated both, emitting the first content token twice to SSE clients. Fix: in the array iteration loops in PredictStream, detect role-init elements (delta has "role" key) and skip attach_chat_deltas for them. Only content/reasoning elements get chat deltas attached. Reasoning models are unaffected because their first token goes into reasoning_content, not content.		2026-04-10 08:45:47 +02:00
..
CMakeLists.txt	fix: BMI2 crash on AVX-only CPUs (Intel Ivy Bridge/Sandy Bridge) (#7864 )	2026-01-06 00:13:48 +00:00
grpc-server.cpp	fix(streaming): skip chat deltas for role-init elements to prevent first token duplication (#9299 )	2026-04-10 08:45:47 +02:00
Makefile	chore(llama.cpp): bump to 'd12cc3d1ca6bba741cd77887ac9c9ee18c8415c7' (#9282 )	2026-04-09 08:12:05 +02:00
package.sh	fix(llama.cpp): bundle libdl, librt, libpthread in llama-cpp backend (#9099 )	2026-03-22 00:58:14 +01:00
prepare.sh	chore: ⬆️ Update ggml-org/llama.cpp to `7f8ef50cce40e3e7e4526a3696cb45658190e69a` (#7402 )	2025-12-01 07:50:40 +01:00
run.sh	fix(llama-cpp/darwin): make sure to bundle `libutf8` libs (#6060 )	2025-08-14 17:56:35 +02:00