Multiple NVIDIA RTX GPU for Cuda (arch linux) with EGPU
0
votes
0
answers
186
views
I've got an **arch linux**, with two GPU in the laptop (**thinkpad P14s Gen 4**) + a new RTX 3090 plugged via thunderbolt 4 with the Cool Master EG200 GPU enclosure:
❯ lspci -k | grep -A 2 -E "(VGA|3D)"
00:02.0 VGA compatible controller: Intel Corporation Raptor Lake-P [Iris Xe Graphics] (rev 04)
Subsystem: Lenovo Raptor Lake-P [Iris Xe Graphics]
Kernel driver in use: i915
--
03:00.0 3D controller: NVIDIA Corporation GA107GLM [RTX A500 Laptop GPU] (rev a1)
Subsystem: Lenovo GA107GLM [RTX A500 Laptop GPU]
Kernel driver in use: nvidia
--
22:00.0 VGA compatible controller: NVIDIA Corporation GA102 [GeForce RTX 3090] (rev a1)
Subsystem: Gigabyte Technology Co., Ltd GA102 [GeForce RTX 3090]
Kernel driver in use: nvidia
The thunderbolt connection to the RTX 3090 is authorized as you can see here:
❯ sudo boltctl info c4010000-0070-740e-0362-00168691c921
[sudo] password for aemonge:
● Cooler Master Technology,Inc MasterCase EG200
├─ type: peripheral
├─ name: MasterCase EG200
├─ vendor: Cooler Master Technology,Inc
├─ uuid: c4010000-0070-740e-0362-00168691c921
├─ dbus path: /org/freedesktop/bolt/devices/c4010000_0070_740e_0362_00168691c921
├─ generation: Thunderbolt 3
├─ status: authorized
│ ├─ domain: 69078780-60ab-fe2a-ffff-ffffffffffff
│ ├─ parent: 69078780-60ab-fe2a-ffff-ffffffffffff
│ ├─ syspath: /sys/devices/pci0000:00/0000:00:0d.2/domain0/0-0/0-1
│ ├─ rx speed: 40 Gb/s = 2 lanes * 20 Gb/s
│ ├─ tx speed: 40 Gb/s = 2 lanes * 20 Gb/s
│ └─ authflags: boot
├─ authorized: Wed 24 Jan 2024 06:49:10 AM UTC
├─ connected: Wed 24 Jan 2024 06:49:10 AM UTC
└─ stored: Tue 23 Jan 2024 03:50:50 PM UTC
├─ policy: iommu
└─ key: no
I really don't care for the graphics, nor the RTX3090 to be loaded in the xorg nor the graphical interface. I just want it to be used as compute only workloads, and I have followed thouroly this arch wiki https://wiki.archlinux.org/title/External_GPU
But givien that context, my nvidia-smi
can't seam to find the GPU:
❯ nvidia-smi -L
GPU 0: NVIDIA RTX A500 Laptop GPU (UUID: GPU-762410c2-1c0d-ef4a-89ac-91afd926381b)
Nor can a simple python script, **cuda-devices.py**:
❯ cat cuda-devics.py
import torch
# Check if CUDA is available
if torch.cuda.is_available():
print("CUDA is available.")
# Get the number of CUDA devices
num_devices = torch.cuda.device_count()
print(f"Number of CUDA devices: {num_devices}")
# Get the name of each CUDA device
for i in range(num_devices):
print(f"Device {i} name: {torch.cuda.get_device_name(i)}")
else:
print("CUDA is not available.")
❯ python cuda-devics.py
CUDA is available.
Number of CUDA devices: 1
Device 0 name: NVIDIA RTX A500 Laptop GPU
❯ CUDA_VISIBLE_DEVICES="0,1,2" python cuda-devics.py
CUDA is available.
Number of CUDA devices: 1
Device 0 name: NVIDIA RTX A500 Laptop GPU
I have also tried with these three repositories https://github.com/ewagner12/all-ways-egpu , https://github.com/karli-sjoberg/gswitch and https://github.com/hertg/egpu-switcher . To disable the internal GPU's A500 and Iris Xe but it's blaking (black screen).
---
## Solved
Solved in https://forums.developer.nvidia.com/t/multiple-nvidia-rtx-gpu-for-cuda-arch-linux-with-egpu/280031/7 nvidia developer forums, by
> generic Top Contributor 5h
> Please check for a bios update. If none is available, please use Software & Updates to switch to the “-open” driver version and set kernel parameter nvidia.NVreg_OpenRmEnableUnsupportedGpus=1
Which meant the following:
sudo pacman -S nvidia-open
**/boot/loader/entries/*_linux.conf**
# Created by: archinstall
# Created on: ***********
title Arch Linux (linux)
linux /vmlinuz-linux
initrd /intel-ucode.img
initrd /initramfs-linux.img
options root=PARTUUID=####-####-####### zswap.enabled=0 rw nvidia.NVreg_OpenRmEnableUnsupportedGpus=1 rootfstype=ext4
Asked by aemonge
(101 rep)
Jan 24, 2024, 11:48 AM
Last activity: Jan 25, 2024, 02:04 PM
Last activity: Jan 25, 2024, 02:04 PM