LocalAI

mirror of https://github.com/mudler/LocalAI synced 2026-05-24 09:28:23 +00:00

History

LocalAI [bot] 0b2ae3c6ca Some checks are pending build backend container images / generate-matrix (push) Waiting to run Details build backend container images / backend-jobs-multiarch (push) Blocked by required conditions Details build backend container images / backend-jobs-singlearch (push) Blocked by required conditions Details build backend container images / backend-merge-jobs-multiarch (push) Blocked by required conditions Details build backend container images / backend-merge-jobs-singlearch (push) Blocked by required conditions Details build backend container images / backend-jobs-darwin (push) Blocked by required conditions Details Build test / build-test (push) Waiting to run Details Build test / launcher-build-darwin (push) Waiting to run Details Build test / launcher-build-linux (push) Waiting to run Details Explorer deployment / build-linux (push) Waiting to run Details GPU tests / ubuntu-latest (1.21.x) (push) Waiting to run Details generate and publish intel docker caches / generate_caches (intel/oneapi-basekit:2025.3.2-0-devel-ubuntu24.04, linux/amd64, arc-runner-set) (push) Waiting to run Details build container images / core-image-build (ubuntu:24.04, vulkan, --jobs=4 --output-sync=target, arm64, linux/arm64, ubuntu-24.04-arm, false, auto, -gpu-vulkan, noble, 2404) (push) Waiting to run Details build container images / core-image-merge (push) Blocked by required conditions Details build container images / gpu-vulkan-image-merge (push) Blocked by required conditions Details build container images / gpu-nvidia-cuda-12-image-merge (push) Blocked by required conditions Details build container images / gpu-nvidia-cuda-13-image-merge (push) Blocked by required conditions Details build container images / gpu-intel-image-merge (push) Blocked by required conditions Details build container images / gpu-hipblas-image-merge (push) Blocked by required conditions Details build container images / nvidia-l4t-arm64-image-merge (push) Blocked by required conditions Details build container images / nvidia-l4t-arm64-cuda-13-image-merge (push) Blocked by required conditions Details build container images / core-image-build (intel/oneapi-basekit:2025.3.2-0-devel-ubuntu24.04, intel, --jobs=3 --output-sync=target, linux/amd64, ubuntu-latest, auto, -gpu-intel, noble, 2404) (push) Waiting to run Details build container images / core-image-build (ubuntu:22.04, cublas, 13, 0, --jobs=4 --output-sync=target, linux/amd64, ubuntu-latest, false, auto, -gpu-nvidia-cuda-13, noble, 2404) (push) Waiting to run Details build container images / core-image-build (ubuntu:24.04, , --jobs=4 --output-sync=target, amd64, linux/amd64, ubuntu-latest, false, auto, , noble, 2404) (push) Waiting to run Details build container images / core-image-build (ubuntu:24.04, , --jobs=4 --output-sync=target, arm64, linux/arm64, ubuntu-24.04-arm, false, auto, , noble, 2404) (push) Waiting to run Details build container images / core-image-build (ubuntu:24.04, cublas, 12, 8, --jobs=4 --output-sync=target, linux/amd64, ubuntu-latest, false, auto, -gpu-nvidia-cuda-12, noble, 2404) (push) Waiting to run Details build container images / core-image-build (ubuntu:24.04, vulkan, --jobs=4 --output-sync=target, amd64, linux/amd64, ubuntu-latest, false, auto, -gpu-vulkan, noble, 2404) (push) Waiting to run Details build container images / hipblas-jobs (rocm/dev-ubuntu-24.04:7.2.1, hipblas, --jobs=3 --output-sync=target, linux/amd64, ubuntu-latest, auto, -gpu-hipblas, noble, 2404) (push) Waiting to run Details build container images / gh-runner (nvcr.io/nvidia/l4t-jetpack:r36.4.0, cublas, 12, 0, --jobs=4 --output-sync=target, linux/arm64, ubuntu-24.04-arm, true, auto, -nvidia-l4t-arm64, jammy, 2204) (push) Waiting to run Details build container images / gh-runner (ubuntu:24.04, cublas, 13, 0, --jobs=4 --output-sync=target, linux/arm64, ubuntu-24.04-arm, false, auto, -nvidia-l4t-arm64-cuda-13, noble, 2404) (push) Waiting to run Details lint / golangci-lint (push) Waiting to run Details Security Scan / tests (push) Waiting to run Details Tests extras backends / detect-changes (push) Waiting to run Details Tests extras backends / tests-ik-llama-cpp-grpc (push) Blocked by required conditions Details Tests extras backends / tests-turboquant-grpc (push) Blocked by required conditions Details Tests extras backends / tests-whisper-grpc-transcription (push) Blocked by required conditions Details Tests extras backends / tests-transformers (push) Blocked by required conditions Details Tests extras backends / tests-rerankers (push) Blocked by required conditions Details Tests extras backends / tests-diffusers (push) Blocked by required conditions Details Tests extras backends / tests-coqui (push) Blocked by required conditions Details Tests extras backends / tests-moonshine (push) Blocked by required conditions Details Tests extras backends / tests-pocket-tts (push) Blocked by required conditions Details Tests extras backends / tests-qwen-tts (push) Blocked by required conditions Details Tests extras backends / tests-qwen-asr (push) Blocked by required conditions Details Tests extras backends / tests-nemo (push) Blocked by required conditions Details Tests extras backends / tests-voxcpm (push) Blocked by required conditions Details Tests extras backends / tests-liquid-audio (push) Blocked by required conditions Details Tests extras backends / tests-sherpa-onnx-grpc-tts (push) Blocked by required conditions Details Tests extras backends / tests-llama-cpp-quantization (push) Blocked by required conditions Details Tests extras backends / tests-llama-cpp-grpc (push) Blocked by required conditions Details Tests extras backends / tests-llama-cpp-grpc-transcription (push) Blocked by required conditions Details Tests extras backends / tests-llama-cpp-smoke (push) Waiting to run Details Tests extras backends / tests-sherpa-onnx-realtime (push) Blocked by required conditions Details Tests extras backends / tests-sherpa-onnx-grpc-transcription (push) Blocked by required conditions Details Tests extras backends / tests-acestep-cpp (push) Blocked by required conditions Details Tests extras backends / tests-qwen3-tts-cpp (push) Blocked by required conditions Details Tests extras backends / tests-vibevoice-cpp (push) Blocked by required conditions Details Tests extras backends / tests-vibevoice-cpp-grpc-tts (push) Blocked by required conditions Details Tests extras backends / tests-vibevoice-cpp-grpc-transcription (push) Blocked by required conditions Details Tests extras backends / tests-localvqe-grpc-transform (push) Blocked by required conditions Details Tests extras backends / tests-voxtral (push) Blocked by required conditions Details Tests extras backends / tests-kokoros (push) Blocked by required conditions Details Tests extras backends / tests-insightface-grpc (push) Blocked by required conditions Details Tests extras backends / tests-speaker-recognition-grpc (push) Blocked by required conditions Details tests / tests-linux (1.26.x) (push) Waiting to run Details tests / tests-apple (1.26.x) (push) Waiting to run Details tests-aio / tests-aio (push) Waiting to run Details E2E Backend Tests / tests-e2e-backend (1.25.x) (push) Waiting to run Details UI E2E Tests / tests-ui-e2e (1.26.x) (push) Waiting to run Details fix(openai): stream usage non-zero when tools are enabled (#9941 ) * chore: ignore local .worktrees directory Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(openai): stream usage non-zero when tools are enabled The streaming chat-completions worker for tool-bearing requests (processTools in core/http/endpoints/openai/chat.go) never forwarded the cumulative TokenUsage from ComputeChoices to the chunks it placed on the responses channel. The outer streaming loop's running usage tracker therefore stayed at the zero value, and the include_usage trailer reported {prompt_tokens:0, completion_tokens:0, total_tokens:0} whenever the request carried a `tools` array. Without tools, the alternative `process` path stamps Usage on every chunk, so that path was unaffected. Forward the final TokenUsage via a usage-only sentinel chunk (empty Choices, populated Usage) emitted right before close(responses). The outer loop's per-chunk Usage capture moves above the empty-Choices skip so the sentinel updates the tracker without ever reaching the wire, keeping the existing OpenAI spec contract (intermediate chunks carry no `usage` field, and the deferred-final-chunk helpers remain Usage-free per the regression test for issue #8546). Adds streamUsageFromTokenUsage, usageSentinelChunk, and applyChunkToUsage helpers with focused Ginkgo coverage plus a flow-level test that mirrors the outer-loop sequence. Fixes #9927 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:opus-4-7 [Claude Code] * refactor(openai): return final TokenUsage from stream workers Replace the usage-only sentinel SSE chunk introduced in the previous commit with a plain return value. The streaming workers process and processTools (now extracted as package-level processStream and processStreamWithTools) return (backend.TokenUsage, error); the outer ChatEndpoint loop reads the cumulative counts off the existing `ended` channel (now carrying streamWorkerResult{usage, err}) and builds the include_usage trailer from a normal Go value after the LOOP exits. This drops the empty-Choices "skip but capture Usage" rule from the outer loop and removes the usageSentinelChunk / applyChunkToUsage helpers entirely. The SSE responses channel is back to a single purpose: wire chunks only. processStream and processStreamWithTools move into chat_stream_workers.go so they can be exercised directly from tests. The chat_stream_usage_test.go suite now drives the workers with a mocked backend.ModelInferenceFunc and asserts on the returned TokenUsage. The regression coverage for issue #9927 is therefore behavioral: reverting the fix (discarding ComputeChoices' usage return) makes the assertions fail with concrete count mismatches. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:opus-4-7 [Claude Code] --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>		2026-05-22 10:13:41 +02:00
..
auth	feat(usage): attribute Sources rows to user accounts in admin view (#9935 )	2026-05-21 23:23:06 +02:00
endpoints	fix(openai): stream usage non-zero when tools are enabled (#9941 )	2026-05-22 10:13:41 +02:00
middleware	feat(usage): track and visualise usage per API key (#9920 )	2026-05-21 16:34:02 +02:00
react-ui	feat(usage): attribute Sources rows to user accounts in admin view (#9935 )	2026-05-21 23:23:06 +02:00
routes	fix(nodes): make per-node backend install async via gallery job queue (#9928 )	2026-05-21 22:25:53 +02:00
static	fix(streaming): comply with OpenAI usage / stream_options spec (#9815 )	2026-05-14 08:53:46 +02:00
views	feat(realtime): WebRTC support (#8790 )	2026-03-13 21:37:15 +01:00
app.go	fix(nodes): make per-node backend install async via gallery job queue (#9928 )	2026-05-21 22:25:53 +02:00
app_test.go	fix(http): honor X-Forwarded-Prefix when proxy strips the prefix (#9614 )	2026-05-13 21:59:33 +02:00
csrf_multipart_test.go	chore: Security hardening (#9719 )	2026-05-08 16:25:45 +02:00
explorer.go	chore(refactor): move logging to common package based on slog (#7668 )	2025-12-21 19:33:13 +01:00
http_suite_test.go	refactor(tests): split app_test.go, move real-backend coverage to e2e-backends	2026-04-27 23:09:20 +00:00
openresponses_test.go	feat: add distributed mode (#9124 )	2026-03-30 00:47:27 +02:00
render.go	feat: add distributed mode (#9124 )	2026-03-30 00:47:27 +02:00
route_coverage_test.go	chore: Security hardening (#9719 )	2026-05-08 16:25:45 +02:00