mirror of
https://github.com/mudler/LocalAI
synced 2026-05-24 09:28:23 +00:00
|
Some checks are pending
build backend container images / generate-matrix (push) Waiting to run
build backend container images / backend-jobs-multiarch (push) Blocked by required conditions
build backend container images / backend-jobs-singlearch (push) Blocked by required conditions
build backend container images / backend-merge-jobs-multiarch (push) Blocked by required conditions
build backend container images / backend-merge-jobs-singlearch (push) Blocked by required conditions
build backend container images / backend-jobs-darwin (push) Blocked by required conditions
Build test / build-test (push) Waiting to run
Build test / launcher-build-darwin (push) Waiting to run
Build test / launcher-build-linux (push) Waiting to run
Explorer deployment / build-linux (push) Waiting to run
GPU tests / ubuntu-latest (1.21.x) (push) Waiting to run
generate and publish intel docker caches / generate_caches (intel/oneapi-basekit:2025.3.2-0-devel-ubuntu24.04, linux/amd64, arc-runner-set) (push) Waiting to run
build container images / core-image-build (ubuntu:24.04, vulkan, --jobs=4 --output-sync=target, arm64, linux/arm64, ubuntu-24.04-arm, false, auto, -gpu-vulkan, noble, 2404) (push) Waiting to run
build container images / core-image-merge (push) Blocked by required conditions
build container images / gpu-vulkan-image-merge (push) Blocked by required conditions
build container images / gpu-nvidia-cuda-12-image-merge (push) Blocked by required conditions
build container images / gpu-nvidia-cuda-13-image-merge (push) Blocked by required conditions
build container images / gpu-intel-image-merge (push) Blocked by required conditions
build container images / gpu-hipblas-image-merge (push) Blocked by required conditions
build container images / nvidia-l4t-arm64-image-merge (push) Blocked by required conditions
build container images / nvidia-l4t-arm64-cuda-13-image-merge (push) Blocked by required conditions
build container images / core-image-build (intel/oneapi-basekit:2025.3.2-0-devel-ubuntu24.04, intel, --jobs=3 --output-sync=target, linux/amd64, ubuntu-latest, auto, -gpu-intel, noble, 2404) (push) Waiting to run
build container images / core-image-build (ubuntu:22.04, cublas, 13, 0, --jobs=4 --output-sync=target, linux/amd64, ubuntu-latest, false, auto, -gpu-nvidia-cuda-13, noble, 2404) (push) Waiting to run
build container images / core-image-build (ubuntu:24.04, , --jobs=4 --output-sync=target, amd64, linux/amd64, ubuntu-latest, false, auto, , noble, 2404) (push) Waiting to run
build container images / core-image-build (ubuntu:24.04, , --jobs=4 --output-sync=target, arm64, linux/arm64, ubuntu-24.04-arm, false, auto, , noble, 2404) (push) Waiting to run
build container images / core-image-build (ubuntu:24.04, cublas, 12, 8, --jobs=4 --output-sync=target, linux/amd64, ubuntu-latest, false, auto, -gpu-nvidia-cuda-12, noble, 2404) (push) Waiting to run
build container images / core-image-build (ubuntu:24.04, vulkan, --jobs=4 --output-sync=target, amd64, linux/amd64, ubuntu-latest, false, auto, -gpu-vulkan, noble, 2404) (push) Waiting to run
build container images / hipblas-jobs (rocm/dev-ubuntu-24.04:7.2.1, hipblas, --jobs=3 --output-sync=target, linux/amd64, ubuntu-latest, auto, -gpu-hipblas, noble, 2404) (push) Waiting to run
build container images / gh-runner (nvcr.io/nvidia/l4t-jetpack:r36.4.0, cublas, 12, 0, --jobs=4 --output-sync=target, linux/arm64, ubuntu-24.04-arm, true, auto, -nvidia-l4t-arm64, jammy, 2204) (push) Waiting to run
build container images / gh-runner (ubuntu:24.04, cublas, 13, 0, --jobs=4 --output-sync=target, linux/arm64, ubuntu-24.04-arm, false, auto, -nvidia-l4t-arm64-cuda-13, noble, 2404) (push) Waiting to run
lint / golangci-lint (push) Waiting to run
Security Scan / tests (push) Waiting to run
Tests extras backends / detect-changes (push) Waiting to run
Tests extras backends / tests-ik-llama-cpp-grpc (push) Blocked by required conditions
Tests extras backends / tests-turboquant-grpc (push) Blocked by required conditions
Tests extras backends / tests-whisper-grpc-transcription (push) Blocked by required conditions
Tests extras backends / tests-transformers (push) Blocked by required conditions
Tests extras backends / tests-rerankers (push) Blocked by required conditions
Tests extras backends / tests-diffusers (push) Blocked by required conditions
Tests extras backends / tests-coqui (push) Blocked by required conditions
Tests extras backends / tests-moonshine (push) Blocked by required conditions
Tests extras backends / tests-pocket-tts (push) Blocked by required conditions
Tests extras backends / tests-qwen-tts (push) Blocked by required conditions
Tests extras backends / tests-qwen-asr (push) Blocked by required conditions
Tests extras backends / tests-nemo (push) Blocked by required conditions
Tests extras backends / tests-voxcpm (push) Blocked by required conditions
Tests extras backends / tests-liquid-audio (push) Blocked by required conditions
Tests extras backends / tests-sherpa-onnx-grpc-tts (push) Blocked by required conditions
Tests extras backends / tests-llama-cpp-quantization (push) Blocked by required conditions
Tests extras backends / tests-llama-cpp-grpc (push) Blocked by required conditions
Tests extras backends / tests-llama-cpp-grpc-transcription (push) Blocked by required conditions
Tests extras backends / tests-llama-cpp-smoke (push) Waiting to run
Tests extras backends / tests-sherpa-onnx-realtime (push) Blocked by required conditions
Tests extras backends / tests-sherpa-onnx-grpc-transcription (push) Blocked by required conditions
Tests extras backends / tests-acestep-cpp (push) Blocked by required conditions
Tests extras backends / tests-qwen3-tts-cpp (push) Blocked by required conditions
Tests extras backends / tests-vibevoice-cpp (push) Blocked by required conditions
Tests extras backends / tests-vibevoice-cpp-grpc-tts (push) Blocked by required conditions
Tests extras backends / tests-vibevoice-cpp-grpc-transcription (push) Blocked by required conditions
Tests extras backends / tests-localvqe-grpc-transform (push) Blocked by required conditions
Tests extras backends / tests-voxtral (push) Blocked by required conditions
Tests extras backends / tests-kokoros (push) Blocked by required conditions
Tests extras backends / tests-insightface-grpc (push) Blocked by required conditions
Tests extras backends / tests-speaker-recognition-grpc (push) Blocked by required conditions
tests / tests-linux (1.26.x) (push) Waiting to run
tests / tests-apple (1.26.x) (push) Waiting to run
tests-aio / tests-aio (push) Waiting to run
E2E Backend Tests / tests-e2e-backend (1.25.x) (push) Waiting to run
UI E2E Tests / tests-ui-e2e (1.26.x) (push) Waiting to run
* chore: ignore local .worktrees directory
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix(openai): stream usage non-zero when tools are enabled
The streaming chat-completions worker for tool-bearing requests
(processTools in core/http/endpoints/openai/chat.go) never forwarded the
cumulative TokenUsage from ComputeChoices to the chunks it placed on the
responses channel. The outer streaming loop's running usage tracker
therefore stayed at the zero value, and the include_usage trailer
reported {prompt_tokens:0, completion_tokens:0, total_tokens:0} whenever
the request carried a `tools` array. Without tools, the alternative
`process` path stamps Usage on every chunk, so that path was unaffected.
Forward the final TokenUsage via a usage-only sentinel chunk (empty
Choices, populated Usage) emitted right before close(responses). The
outer loop's per-chunk Usage capture moves above the empty-Choices skip
so the sentinel updates the tracker without ever reaching the wire,
keeping the existing OpenAI spec contract (intermediate chunks carry no
`usage` field, and the deferred-final-chunk helpers remain Usage-free
per the regression test for issue #8546).
Adds streamUsageFromTokenUsage, usageSentinelChunk, and
applyChunkToUsage helpers with focused Ginkgo coverage plus a flow-level
test that mirrors the outer-loop sequence.
Fixes #9927
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:opus-4-7 [Claude Code]
* refactor(openai): return final TokenUsage from stream workers
Replace the usage-only sentinel SSE chunk introduced in the previous
commit with a plain return value. The streaming workers process and
processTools (now extracted as package-level processStream and
processStreamWithTools) return (backend.TokenUsage, error); the outer
ChatEndpoint loop reads the cumulative counts off the existing `ended`
channel (now carrying streamWorkerResult{usage, err}) and builds the
include_usage trailer from a normal Go value after the LOOP exits.
This drops the empty-Choices "skip but capture Usage" rule from the
outer loop and removes the usageSentinelChunk / applyChunkToUsage
helpers entirely. The SSE responses channel is back to a single
purpose: wire chunks only.
processStream and processStreamWithTools move into chat_stream_workers.go
so they can be exercised directly from tests. The chat_stream_usage_test.go
suite now drives the workers with a mocked backend.ModelInferenceFunc
and asserts on the returned TokenUsage. The regression coverage for
issue #9927 is therefore behavioral: reverting the fix (discarding
ComputeChoices' usage return) makes the assertions fail with concrete
count mismatches.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:opus-4-7 [Claude Code]
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
|
||
|---|---|---|
| .. | ||
| auth | ||
| endpoints | ||
| middleware | ||
| react-ui | ||
| routes | ||
| static | ||
| views | ||
| app.go | ||
| app_test.go | ||
| csrf_multipart_test.go | ||
| explorer.go | ||
| http_suite_test.go | ||
| openresponses_test.go | ||
| render.go | ||
| route_coverage_test.go | ||