LocalAI/pkg
LocalAI [bot] 6e5a58ca70
feat: Add Free RPC to backend.proto for VRAM cleanup (#8751)
* fix: Add VRAM cleanup when stopping models

- Add Free() method to AIModel interface for proper GPU resource cleanup
- Implement Free() in llama backend to release llama.cpp model resources
- Add Free() stub implementations in base and SingleThread backends
- Modify deleteProcess() to call Free() before stopping the process
  to ensure VRAM is properly released when models are unloaded

Fixes issue where VRAM was not freed when stopping models, which
could lead to memory exhaustion when running multiple models
sequentially.

* feat: Add Free RPC to backend.proto for VRAM cleanup\n\n- Add rpc Free(HealthMessage) returns (Result) {} to backend.proto\n- This RPC is required to properly expose the Free() method\n  through the gRPC interface for VRAM resource cleanup\n\nRefs: PR #8739

* Apply suggestion from @mudler

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>

---------

Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
Co-authored-by: localai-bot <localai-bot@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2026-03-03 12:39:06 +01:00
..
audio feat(audio): set audio content type (#8416) 2026-02-05 19:14:12 +01:00
concurrency chore: update jobresult_test.go (#4124) 2024-11-12 08:52:18 +01:00
downloader feat(ui): add model size estimation (#8684) 2026-02-28 23:03:47 +01:00
format feat(api): Add transcribe response format request parameter & adjust STT backends (#8318) 2026-02-01 17:33:17 +01:00
functions fix(toolcall): consider also literal \n between tags 2026-03-01 11:20:46 +01:00
grpc feat: Add Free RPC to backend.proto for VRAM cleanup (#8751) 2026-03-03 12:39:06 +01:00
huggingface-api feat(hf-api): return files in nested directories (#7396) 2025-11-30 09:06:54 +01:00
langchain feat(llama.cpp): do not specify backends to autoload and add llama.cpp variants (#2232) 2024-05-04 17:56:12 +02:00
model feat: Add Free RPC to backend.proto for VRAM cleanup (#8751) 2026-03-03 12:39:06 +01:00
oci feat(ui): allow to cancel ops (#7264) 2025-11-13 18:41:47 +01:00
reasoning feat(openresponses): Support reasoning blocks (#8133) 2026-01-21 00:11:45 +01:00
signals chore: update cogito and simplify MCP logics (#6413) 2025-10-09 12:36:45 +02:00
sound feat: Realtime API support reboot (#5392) 2025-05-25 22:25:05 +02:00
store chore: fix go.mod module (#2635) 2024-06-23 08:24:36 +00:00
system fix: whisper breaking on cuda-13 (use absolute path for CUDA directory detection) (#8678) 2026-02-28 09:10:40 +01:00
utils Add sample_rate support to TTS API via post-processing resampling (#8650) 2026-02-25 16:36:27 +01:00
vram feat(ui): add model size estimation (#8684) 2026-02-28 23:03:47 +01:00
xio feat(ui): allow to cancel ops (#7264) 2025-11-13 18:41:47 +01:00
xsync chore: fix go.mod module (#2635) 2024-06-23 08:24:36 +00:00
xsysinfo fix: drop gguf VRAM estimation (now redundant) (#8325) 2026-02-01 17:33:28 +01:00