Sample Header Ad - 728x90

How do I get rootless podman to work with nvidia gpu after reboot?

1 vote
0 answers
334 views
I have a RHEL9 system with a NVIDIA L40S and Driver Version: 570.124.06 CUDA Version: 12.8. Installed as described here by (basically) running: # dnf config-manager --add-repo http://developer.download.nvidia.com/compute/cuda/repos/rhel9/$(uname -i)/cuda-rhel9.repo # dnf module install nvidia-driver:latest # nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml In order to allow non-root users access to the GPU via podman I changed the selinux type on the /dev/nvidia* device objects like so:
# semanage fcontext -a -t container_file_t '/dev/nvidia(.*)?'
# restorecon -rv /dev/nvidia*
Relabeled /dev/nvidia0 from system_u:object_r:xserver_misc_device_t:s0 to system_u:object_r:container_file_t:s0
Relabeled /dev/nvidia-caps from unconfined_u:object_r:device_t:s0 to unconfined_u:object_r:container_file_t:s0
Relabeled /dev/nvidia-caps/nvidia-cap2 from unconfined_u:object_r:xserver_misc_device_t:s0 to unconfined_u:object_r:container_file_t:s0
Relabeled /dev/nvidia-caps/nvidia-cap1 from unconfined_u:object_r:xserver_misc_device_t:s0 to unconfined_u:object_r:container_file_t:s0
Relabeled /dev/nvidiactl from system_u:object_r:xserver_misc_device_t:s0 to system_u:object_r:container_file_t:s0
Relabeled /dev/nvidia-uvm from unconfined_u:object_r:xserver_misc_device_t:s0 to unconfined_u:object_r:container_file_t:s0
Relabeled /dev/nvidia-uvm-tools from unconfined_u:object_r:xserver_misc_device_t:s0 to unconfined_u:object_r:container_file_t:s0
After which a non-root could run the following successfully: $ podman run --rm --device nvidia.com/gpu=all nvidia/cuda:12.8.1-base-ubi9 nvidia-smi Everything done and looking good I figured until dnf-automatic rebooted the machine. When the machine comes up after reboot all the device files are naturally re-created with the old selinux labels - forcing a root to re-run the restorecon to make the devices available to the user. To make things worse not all nvidia devices are created on boot, specifically the user runs into:
$ podman run --rm --device nvidia.com/gpu=all nvidia/cuda:12.8.1-base-ubi9 nvidia-smi
Error: setting up CDI devices: failed to inject devices: failed to stat CDI host device "/dev/nvidia-uvm": no such file or directory
If I run something like nvidia-smi the driver "wakes up" and creates the uvm-device. After which I can run restorecon. After which the user can do its things. Am I holding it wrong? Should I really need to create a oneshot-unit to massage these things into place? Or is there someway to teach selinux and the nvidia driver how to wake up in the desired state?
Asked by azzid (1010 rep)
Mar 21, 2025, 07:33 AM