LocalAI

mirror of https://github.com/mudler/LocalAI synced 2026-05-24 09:28:23 +00:00

History

Ettore Di Giacinto 551ebdb57a fix(distributed): correct VRAM/RAM reporting on NVIDIA unified-memory hosts (#9545 ) Workers on NVIDIA unified-memory hardware (DGX Spark / GB10, Jetson AGX Thor, Jetson Orin/Xavier/Nano) were reporting `available_vram=0` back to the frontend, so the Nodes UI showed the node as fully used even when most of the unified memory was actually free. Three causes addressed: * `isTegraDevice` only matched `/sys/devices/soc0/family == "Tegra"`. DGX Spark (SBSA) reports JEDEC codes there instead — `jep106:0426` for the NVIDIA manufacturer — so the Tegra/unified-memory fallback never ran. Renamed to `isNVIDIAIntegratedGPU` and extended to also match `jep106:0426[:]` via `/sys/devices/soc0/soc_id`. The unified-iGPU code defaulted the device name to `"NVIDIA Jetson"` when `/proc/device-tree/model` was missing. That's what happens for Thor inside a docker container, and always on DGX Spark. New `nvidiaIntegratedGPUName` resolves via dt-model → `/sys/devices/soc0/machine` → `soc_id` lookup (`jep106:0426:8901` → `"NVIDIA GB10"`) so the Nodes UI labels the box correctly. * Worker heartbeat sent `available_vram=0` (or total-as-available) when VRAM usage was momentarily unknown — e.g. when `nvidia-smi` intermittently failed with `waitid: no child processes` under containers without `--init`. Each such heartbeat overwrote the DB and made the UI flip to "fully used". `heartbeatBody` now omits `available_vram` in that case so the DB keeps its last good value. Also updates the commented GPU blocks in both compose files with `NVIDIA_DRIVER_CAPABILITIES=compute,utility`, `capabilities: [gpu, utility]`, and `init: true`, and documents the requirement in the distributed-mode and nvidia-l4t pages. Without `utility`, NVML/`nvidia-smi` are absent inside the container, which is what put the DGX Spark worker into the buggy fallback in the first place. Detection verified on live hardware (dgx.casa / GB10 and 192.168.68.23 / Thor) by running a cross-compiled probe of the new helpers on both host and inside the worker container. Assisted-by: Claude:opus-4.7 [Claude Code]		2026-04-24 22:02:23 +02:00
..
_index.en.md	fix(docs): fix broken references to distributed mode	2026-04-03 09:46:06 +02:00
agents.md	fix(docs): fix broken references to distributed mode	2026-04-03 09:46:06 +02:00
api-discovery.md	feat(api): Allow coding agents to interactively discover how to control and configure LocalAI (#9084 )	2026-04-04 15:14:35 +02:00
audio-to-text.md	chore(docs): update transcription endpoint	2026-04-14 14:14:54 +00:00
authentication.md	fix(docs): fix broken references to distributed mode	2026-04-03 09:46:06 +02:00
backend-monitor.md	fix(backend-monitor): accept model as a query parameter (#9411 )	2026-04-21 22:06:35 +02:00
backends.md	fix(docs): fix broken references to distributed mode	2026-04-03 09:46:06 +02:00
constrained_grammars.md	fix(docs): fix broken references to distributed mode	2026-04-03 09:46:06 +02:00
distributed-mode.md	fix(distributed): correct VRAM/RAM reporting on NVIDIA unified-memory hosts (#9545 )	2026-04-24 22:02:23 +02:00
distributed_inferencing.md	fix(docs): fix broken references to distributed mode	2026-04-03 09:46:06 +02:00
distribution.md	fix(docs): commit distribution.md	2026-04-03 10:14:13 +02:00
embeddings.md	feat(face-recognition): add insightface/onnx backend for 1:1 verify, 1:N identify, embedding, detection, analysis (#9480 )	2026-04-22 21:55:41 +02:00
face-recognition.md	feat(insightface): add antispoofing (liveness) detection (#9515 )	2026-04-23 18:28:15 +02:00
fine-tuning.md	fix(docs): Use notice instead of alert (#9134 )	2026-03-25 13:55:48 +01:00
gpt-vision.md	fix(docs): fix broken references to distributed mode	2026-04-03 09:46:06 +02:00
GPU-acceleration.md	feat(rocm): bump to 7.x (#9323 )	2026-04-12 08:51:30 +02:00
image-generation.md	fix(docs): fix broken references to distributed mode	2026-04-03 09:46:06 +02:00
mcp.md	fix(docs): fix broken references to distributed mode	2026-04-03 09:46:06 +02:00
mlx-distributed.md	feat(mlx-distributed): add new MLX-distributed backend (#8801 )	2026-03-09 17:29:32 +01:00
model-gallery.md	fix(docs): fix broken references to distributed mode	2026-04-03 09:46:06 +02:00
object-detection.md	feat(face-recognition): add insightface/onnx backend for 1:1 verify, 1:N identify, embedding, detection, analysis (#9480 )	2026-04-22 21:55:41 +02:00
openai-functions.md	docs: document tool calling on vLLM and MLX backends	2026-04-13 16:58:55 +00:00
openai-realtime.md	Remove header from OpenAI Realtime API documentation	2026-04-09 09:00:28 +02:00
p2p.md	feat: Add documentation for undocumented API endpoints (#8852 )	2026-03-08 17:59:33 +01:00
quantization.md	fix(docs): Use notice instead of alert (#9134 )	2026-03-25 13:55:48 +01:00
reranker.md	fix(docs): fix broken references to distributed mode	2026-04-03 09:46:06 +02:00
runtime-settings.md	fix(docs): fix broken references to distributed mode	2026-04-03 09:46:06 +02:00
sound-generation.md	feat: Add documentation for undocumented API endpoints (#8852 )	2026-03-08 17:59:33 +01:00
stores.md	feat(face-recognition): add insightface/onnx backend for 1:1 verify, 1:N identify, embedding, detection, analysis (#9480 )	2026-04-22 21:55:41 +02:00
text-generation.md	feat(backend): add turboquant llama.cpp-fork backend (#9355 )	2026-04-15 01:25:04 +02:00
text-to-audio.md	fix(docs): fix broken references to distributed mode	2026-04-03 09:46:06 +02:00
video-generation.md	feat: Add documentation for undocumented API endpoints (#8852 )	2026-03-08 17:59:33 +01:00
voice-activity-detection.md	feat: Add documentation for undocumented API endpoints (#8852 )	2026-03-08 17:59:33 +01:00
voice-recognition.md	feat: voice recognition (#9500 )	2026-04-23 12:07:14 +02:00