LocalAI

mirror of https://github.com/mudler/LocalAI synced 2026-04-21 13:27:21 +00:00

History

LocalAI [bot] 6e5a58ca70 feat: Add Free RPC to backend.proto for VRAM cleanup (#8751 ) * fix: Add VRAM cleanup when stopping models - Add Free() method to AIModel interface for proper GPU resource cleanup - Implement Free() in llama backend to release llama.cpp model resources - Add Free() stub implementations in base and SingleThread backends - Modify deleteProcess() to call Free() before stopping the process to ensure VRAM is properly released when models are unloaded Fixes issue where VRAM was not freed when stopping models, which could lead to memory exhaustion when running multiple models sequentially. * feat: Add Free RPC to backend.proto for VRAM cleanup\n\n- Add rpc Free(HealthMessage) returns (Result) {} to backend.proto\n- This RPC is required to properly expose the Free() method\n through the gRPC interface for VRAM resource cleanup\n\nRefs: PR #8739 * Apply suggestion from @mudler Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com> --------- Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com> Co-authored-by: localai-bot <localai-bot@users.noreply.github.com> Co-authored-by: Ettore Di Giacinto <mudler@users.noreply.github.com>		2026-03-03 12:39:06 +01:00
..
audio	feat(audio): set audio content type (#8416 )	2026-02-05 19:14:12 +01:00
concurrency	chore: update jobresult_test.go (#4124 )	2024-11-12 08:52:18 +01:00
downloader	feat(ui): add model size estimation (#8684 )	2026-02-28 23:03:47 +01:00
format	feat(api): Add transcribe response format request parameter & adjust STT backends (#8318 )	2026-02-01 17:33:17 +01:00
functions	fix(toolcall): consider also literal \n between tags	2026-03-01 11:20:46 +01:00
grpc	feat: Add Free RPC to backend.proto for VRAM cleanup (#8751 )	2026-03-03 12:39:06 +01:00
huggingface-api	feat(hf-api): return files in nested directories (#7396 )	2025-11-30 09:06:54 +01:00
langchain	feat(llama.cpp): do not specify backends to autoload and add llama.cpp variants (#2232 )	2024-05-04 17:56:12 +02:00
model	feat: Add Free RPC to backend.proto for VRAM cleanup (#8751 )	2026-03-03 12:39:06 +01:00
oci	feat(ui): allow to cancel ops (#7264 )	2025-11-13 18:41:47 +01:00
reasoning	feat(openresponses): Support reasoning blocks (#8133 )	2026-01-21 00:11:45 +01:00
signals	chore: update cogito and simplify MCP logics (#6413 )	2025-10-09 12:36:45 +02:00
sound	feat: Realtime API support reboot (#5392 )	2025-05-25 22:25:05 +02:00
store	chore: fix go.mod module (#2635 )	2024-06-23 08:24:36 +00:00
system	fix: whisper breaking on cuda-13 (use absolute path for CUDA directory detection) (#8678 )	2026-02-28 09:10:40 +01:00
utils	Add `sample_rate` support to TTS API via post-processing resampling (#8650 )	2026-02-25 16:36:27 +01:00
vram	feat(ui): add model size estimation (#8684 )	2026-02-28 23:03:47 +01:00
xio	feat(ui): allow to cancel ops (#7264 )	2025-11-13 18:41:47 +01:00
xsync	chore: fix go.mod module (#2635 )	2024-06-23 08:24:36 +00:00
xsysinfo	fix: drop gguf VRAM estimation (now redundant) (#8325 )	2026-02-01 17:33:28 +01:00