LocalAI

mirror of https://github.com/mudler/LocalAI synced 2026-05-24 09:28:23 +00:00

History

LocalAI [bot] 8af963bdd9 fix(streaming): comply with OpenAI usage / stream_options spec (#9815 ) * fix(streaming): comply with OpenAI usage / stream_options spec (#8546) LocalAI emitted `"usage":{"prompt_tokens":0,...}` on every streamed chunk because `OpenAIResponse.Usage` was a value type without `omitempty`. The official OpenAI Node SDK and its consumers (continuedev/continue, Kilo Code, Roo Code, Zed, IntelliJ Continue) filter on a truthy `result.usage` to detect the trailing usage chunk; LocalAI's zero-but-non-null usage on every intermediate chunk made that filter swallow every content chunk and surface an empty chat response while the server log looked successful. Changes: - `core/schema/openai.go`: `Usage OpenAIUsage \`json:"usage,omitempty"\`` so intermediate chunks no longer carry a `usage` key. Add `OpenAIRequest.StreamOptions` with `include_usage` to mirror OpenAI's request field. - `core/http/endpoints/openai/chat.go` and `completion.go`: keep using the `Usage` struct field as an in-process channel for the running cumulative, but strip it before JSON marshalling. When the request set `stream_options.include_usage: true`, emit a dedicated trailing chunk with `"choices": []` and the populated usage (matching the OpenAI spec and llama.cpp's server behavior). - `chat_emit.go`: new `streamUsageTrailerJSON` helper; drop the `usage` parameter from `buildNoActionFinalChunks` since chunks no longer carry usage. - Update `image.go`, `inpainting.go`, `edit.go` to wrap their Usage values with `&` for the new pointer field. - UI: send `stream_options:{include_usage:true}` from the React (`useChat.js`) and legacy (`static/chat.js`) chat clients so the token-count badge keeps populating now that the server is spec-compliant. Tests: - New `chat_stream_usage_test.go` pins the spec invariants: intermediate chunks have no `usage` key, the trailer JSON has `"choices":[]` and a populated `usage`, and `OpenAIRequest` parses `stream_options.include_usage`. - Update `chat_emit_test.go` to reflect that finals no longer embed usage. Verified against the live LocalAI instance: before the fix Continue's filter logic swallowed 16/16 token chunks; with the new shape it yields 4/5 and routes usage through the dedicated trailer chunk. Fixes #8546 Assisted-by: Claude:opus-4.7 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> fix(streaming): silence errcheck on usage trailer Fprintf The new spec-compliant `stream_options.include_usage` trailer writes were flagged by errcheck since they're new code (golangci-lint runs new-from-merge-base on master); the surrounding `fmt.Fprintf` data: writes are grandfathered. Drop the return values explicitly to match the linter's contract without adding a nolint shim. Assisted-by: Claude:opus-4.7 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>		2026-05-14 08:53:46 +02:00
..
agent_jobs.go	feat(api): Allow coding agents to interactively discover how to control and configure LocalAI (#9084 )	2026-04-04 15:14:35 +02:00
anthropic.go	fix(anthropic): show null index when not present, default to 0 (#9225 )	2026-04-04 15:13:17 +02:00
anthropic_test.go	feat: add distributed mode (#9124 )	2026-03-30 00:47:27 +02:00
audio_transform.go	feat: add LocalVQE backend and audio transformations UI (#9640 )	2026-05-04 22:07:11 +02:00
backend.go	feat: Add backend gallery (#5607 )	2025-06-15 14:56:52 +02:00
diarization.go	feat(api): add /v1/audio/diarization endpoint with sherpa-onnx + vibevoice.cpp (#9654 )	2026-05-05 15:10:13 +02:00
elevenlabs.go	feat: add distributed mode (#9124 )	2026-03-30 00:47:27 +02:00
finetune.go	feat: add distributed mode (#9124 )	2026-03-30 00:47:27 +02:00
gallery-model.schema.json	[gallery] add JSON schema for gallery model specification (#7890 )	2026-01-06 22:10:43 +01:00
jina.go	fix(reranker): tests and top_n check fix #7212 (#7284 )	2025-11-16 17:53:23 +01:00
localai.go	feat(insightface): add antispoofing (liveness) detection (#9515 )	2026-04-23 18:28:15 +02:00
message.go	feat(vllm): parity with llama.cpp backend (#9328 )	2026-04-13 11:00:29 +02:00
message_test.go	feat(vllm): parity with llama.cpp backend (#9328 )	2026-04-13 11:00:29 +02:00
ollama.go	fix(ollama): accept `prompt` alias on /api/embed for Ollama parity (#9780 )	2026-05-12 17:21:20 +02:00
ollama_test.go	fix(ollama): accept `prompt` alias on /api/embed for Ollama parity (#9780 )	2026-05-12 17:21:20 +02:00
openai.go	fix(streaming): comply with OpenAI usage / stream_options spec (#9815 )	2026-05-14 08:53:46 +02:00
openresponses.go	feat: add distributed mode (#9124 )	2026-03-30 00:47:27 +02:00
prediction.go	fix: implement encoding_format=base64 for embeddings endpoint (#9135 )	2026-03-25 17:38:07 +01:00
quantization.go	feat: add distributed mode (#9124 )	2026-03-30 00:47:27 +02:00
request.go	feat: add distributed mode (#9124 )	2026-03-30 00:47:27 +02:00
schema_suite_test.go	feat(llama.cpp): consolidate options and respect tokenizer template when enabled (#7120 )	2025-11-07 21:23:50 +01:00
tokenize.go	feat(api): Allow coding agents to interactively discover how to control and configure LocalAI (#9084 )	2026-04-04 15:14:35 +02:00
transcription.go	feat: support word-level timestamps for faster-whisper (#9621 )	2026-05-06 00:32:52 +02:00
transcription_format.go	feat: add distributed mode (#9124 )	2026-03-30 00:47:27 +02:00