Linux freezes after a cold start: "NVRM: GPU has fallen off the bus", Xid 79
0
votes
1
answer
914
views
Here my configuration:
* AMD Ryzen 9 7950X 16-Core
* Gigabyte X670E Aorus Master
* DDR5 Corsair Vengeance 5200 MHz 16 GB
* PNY Nvidia GeForce RTX 4080
I have a dual boot with Windows 11 and Ubuntu 23.04.
Windows runs fine.
Linux, *every* time I turn on the PC after a power cycle (i.e. a "cold boot"), hangs withing few minutes. Hang means the screen freezes on what I'm doing, nothing is responsive anymore - even the keyboard. I have to hardware reset the machine. Sometimes, after several minutes it reboots itself.
Once it has rebooted, I can work for the whole day without any other issue.
I tried to: turn on the PC, after login reboot. No way, *it has to freeze anyway*.
Other things I've already inspected:
- I had two DDR5 modules, but one was defective so I removed it. Anyway, the problems with the faulty one were different and happened on both Windows and Linux.
- tried to move the RAM module into the other slot (i.e. from A2 to B2)
- ran memtest86+ several times
- removed the proprietary drivers for the graphic card. Currently I'm using the default opensource xserver-xorg-video-nouveau (no GPU acceleration)
- tried to switch between xorg and wayland
- inspected some system logs (dmesg, syslog, xorg) but I didn't find anything relevant (at least to me!)
- updated to the latest packages versions
- reinstalled Ubuntu from scratch
- updated BIOS to the latest version
- added
Here I read it seems a bug of nvidia, since - like the user - 1. it happens no matter what I'm doing, even with no activity at all (hence no thermal/ps cause) 2. after the reboot it works fine the whole day 3. in Windows there are no issues at all.
Have I to live with it? Or is there a way to fix?
pcie_aspm=off
kernel option
Does this description may help you to me on the right track?
What else can I do to find out the reason of the hangs? Where and what should I look for in the log files?
UPDATE
--
Thanks to user Artem S. Tashkinov, I discovered that during those hangs the machine is still alive and accepts SSH connections.
dmesg
clearly says the GPU is the culprit:

Asked by Mark
(815 rep)
May 25, 2023, 09:28 AM
Last activity: May 26, 2023, 05:02 PM
Last activity: May 26, 2023, 05:02 PM