The strange power consumption behaviour of a Quadro card when `vfio-pci` has been removed an `nvidial` reattached
1
vote
0
answers
204
views
I have built a system with a Geforce GTX 960 and a Quadro M4000 graphics card, that I usually pass through to a virtual machine. The GTX 960 card is only used by the host.
Normally, the Quadro card would not be available by the host, because the kernel driver
vfio-pci
prevents it from being used. However, when I don't use it in the virtual machine, then I would like to have it accessible from the host machine, e.g. to do some computation.
But, there is this very strange behaviour in power consumption and fan speed...
How can I reduce the power consumption and fan speed without needing to have nvidia-setttings
open all the time?
From my notes:
## Reuse a Passed-through-ready Device on the Host
Supposed a secondary graphics card, that has been prepared for passing it through to a guest, should be used on the host instead.
The device would normally not be usable on the host, since the wrong driver is loaded.
Here, the Quadro M4000 has the vfio-pci
driver in use, but instead the nvidia
driver should be used.
sudo lspci -nnk | egrep -A3 "VGA|Display|3D"
# 0b:00.0 VGA compatible controller : NVIDIA Corporation GM206 [GeForce GTX 960] [10de:1401] (rev a1)
# Subsystem: Gigabyte Technology Co., Ltd Device [1458:36ac]
# Kernel driver in use: nvidia
# Kernel modules: nouveau, nvidia_drm, nvidia
# --
# 0c:00.0 VGA compatible controller : NVIDIA Corporation GM204GL [Quadro M4000] [10de:13f1] (rev a1)
# Subsystem: Hewlett-Packard Company Device [103c:1153]
# Kernel driver in use: vfio-pci
# Kernel modules: nouveau, nvidia_drm, nvidia
Unload the vfio-pci
driver and check the device status again.
No kernel driver should be in use, hence line Kernel driver in use: ...
is gone.
sudo modprobe -r vfio-pci
sudo lspci -nnk | egrep -A3 "VGA|Display|3D"
# 0b:00.0 VGA compatible controller : NVIDIA Corporation GM206 [GeForce GTX 960] [10de:1401] (rev a1)
# Subsystem: Gigabyte Technology Co., Ltd Device [1458:36ac]
# Kernel driver in use: nvidia
# Kernel modules: nouveau, nvidia_drm, nvidia
# --
# 0c:00.0 VGA compatible controller : NVIDIA Corporation GM204GL [Quadro M4000] [10de:13f1] (rev a1)
# Subsystem: Hewlett-Packard Company Device [103c:1153]
# Kernel modules: nouveau, nvidia_drm, nvidia
# 0c:00.1 Audio device : NVIDIA Corporation GM204 High Definition Audio Controller [10de:0fbb] (rev a1)
Also check the output of the nvidia driver tool nvidia-smi
.
It should list only one graphics card (the not-passed-through GTX 960).
sudo nvidia-smi
# Tue Sep 28 18:19:36 2021
# +-----------------------------------------------------------------------------+
# | NVIDIA-SMI 470.74 Driver Version: 470.74 CUDA Version: 11.4 |
# |-------------------------------+----------------------+----------------------+
# | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
# | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
# | | | MIG M. |
# |===============================+======================+======================|
# | 0 NVIDIA GeForce ... Off | 00000000:0B:00.0 On | N/A |
# | 0% 51C P8 19W / 160W | 477MiB / 4040MiB | 0% Default |
# | | | N/A |
# +-------------------------------+----------------------+----------------------+
# ...
Remove all associated PCI devices from the system.
In this case, those are 0c:00.0
and 0c:00.1
.
Then check that those are actually gone.
echo 1 | sudo tee /sys/bus/pci/devices/0000\:0c\:00.0/remove
echo 1 | sudo tee /sys/bus/pci/devices/0000\:0c\:00.1/remove
sudo ls /sys/bus/pci/devices/ | grep 0c:00.
# nothing...
Then let it rescan
for PCI devices and check if the devices are there again and enabled.
Also check which kernel driver is in use and what nvidia-smi
is telling.
echo 1 | sudo tee /sys/bus/pci/rescan
sudo ls /sys/bus/pci/devices/ | grep 0c:00.
sudo cat /sys/bus/pci/devices/0000\:0c\:00.?/enable
# 1
# 1
sudo lspci -nnk | egrep -A3 "VGA|Display|3D"
# 0b:00.0 VGA compatible controller : NVIDIA Corporation GM206 [GeForce GTX 960] [10de:1401] (rev a1)
# Subsystem: Gigabyte Technology Co., Ltd Device [1458:36ac]
# Kernel driver in use: nvidia
# Kernel modules: nouveau, nvidia_drm, nvidia
# --
# 0c:00.0 VGA compatible controller : NVIDIA Corporation GM204GL [Quadro M4000] [10de:13f1] (rev a1)
# Subsystem: Hewlett-Packard Company Device [103c:1153]
# Kernel driver in use: nvidia # <-- here!
# Kernel modules: nouveau, nvidia_drm, nvidia
sudo nvidia-smi
# Tue Sep 28 18:26:16 2021
# +-----------------------------------------------------------------------------+
# | NVIDIA-SMI 470.74 Driver Version: 470.74 CUDA Version: 11.4 |
# |-------------------------------+----------------------+----------------------+
# | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
# | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
# | | | MIG M. |
# |===============================+======================+======================|
# | 0 NVIDIA GeForce ... Off | 00000000:0B:00.0 On | N/A |
# | 0% 47C P8 19W / 160W | 479MiB / 4040MiB | 0% Default |
# | | | N/A |
# +-------------------------------+----------------------+----------------------+
# | 1 Quadro M4000 Off | 00000000:0C:00.0 Off | N/A |
# | 45% 37C P0 42W / 120W | 0MiB / 8127MiB | 2% Default |
# | | | N/A |
# +-------------------------------+----------------------+----------------------+
# ...
Funny enough, the Quadro M4000 consumes about 42 Watts under absolutely no load.
I guess this is due to a driver problem...
**However**, if the graphical nvidia-settings
program is loaded, the power demand **drops** to about **12 Watts**.
# Terminal A
watch -d -n 1 sudo nvidia-smi
# Terminal B
nvidia-settings
Watch nvidia-smi
and listen to the fan noise when the magic happens...
watch -d -n 1 sudo nvidia-smi
# ...
# +-------------------------------+----------------------+----------------------+
# | 1 Quadro M4000 Off | 00000000:0C:00.0 Off | N/A |
# | 46% 38C P0 10W / 120W | 0MiB / 8127MiB | 0% Default |
# | | | N/A |
# +-------------------------------+----------------------+----------------------+
# ...
Best of all -- nvidia-settings
does not even list my Quadro card...

Asked by dani
(102 rep)
Sep 28, 2021, 05:21 PM