diff --git a/website/docs/podman/gpu.md b/website/docs/podman/gpu.md index eabd4774fb7..241fd6f03de 100644 --- a/website/docs/podman/gpu.md +++ b/website/docs/podman/gpu.md @@ -43,12 +43,18 @@ Run the following commands **on the Podman Machine, not the host system**: ```sh $ curl -s -L https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo | \ - sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo && \ - sudo yum install -y nvidia-container-toolkit && \ - sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml && \ + tee /etc/yum.repos.d/nvidia-container-toolkit.repo && \ + yum install -y nvidia-container-toolkit && \ + nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml && \ nvidia-ctk cdi list ``` +:::info + +A configuration change might occur when you create or remove Multi-Instance GPU (MIG) devices, or upgrade the Compute Unified Device Architecture (CUDA) driver. In such cases, you must generate a new Container Device Interface (CDI) specification. + +::: + #### Verification To verify that containers created can access the GPU, you can use `nvidia-smi` from within a container with NVIDIA drivers installed. @@ -85,6 +91,31 @@ Fri Aug 16 18:58:14 2024 +---------------------------------------------------------------------------------------+ ``` +#### Troubleshooting + +#### Version mismatch + +You might encounter the following error inside the containers: + +``` +# nvidia-smi +Failed to initialize NVML: N/A +``` + +This problem is related to a mismatch between the Container Device Interface (CDI) and the installed version. + +To fix this problem, generate a new CDI specification by running the following inside the Podman machine: + +``` +nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml +``` + +:::info + +You might need to restart your Podman machine. + +::: + #### Additional resources - [NVIDIA Container Toolkit Installation](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html#installing-with-yum-or-dnf)