Sample Header Ad - 728x90

Linux freezes after a cold start: "NVRM: GPU has fallen off the bus", Xid 79

0 votes
1 answer
914 views
Here my configuration: * AMD Ryzen 9 7950X 16-Core * Gigabyte X670E Aorus Master * DDR5 Corsair Vengeance 5200 MHz 16 GB * PNY Nvidia GeForce RTX 4080 I have a dual boot with Windows 11 and Ubuntu 23.04. Windows runs fine. Linux, *every* time I turn on the PC after a power cycle (i.e. a "cold boot"), hangs withing few minutes. Hang means the screen freezes on what I'm doing, nothing is responsive anymore - even the keyboard. I have to hardware reset the machine. Sometimes, after several minutes it reboots itself. Once it has rebooted, I can work for the whole day without any other issue. I tried to: turn on the PC, after login reboot. No way, *it has to freeze anyway*. Other things I've already inspected: - I had two DDR5 modules, but one was defective so I removed it. Anyway, the problems with the faulty one were different and happened on both Windows and Linux. - tried to move the RAM module into the other slot (i.e. from A2 to B2) - ran memtest86+ several times - removed the proprietary drivers for the graphic card. Currently I'm using the default opensource xserver-xorg-video-nouveau (no GPU acceleration) - tried to switch between xorg and wayland - inspected some system logs (dmesg, syslog, xorg) but I didn't find anything relevant (at least to me!) - updated to the latest packages versions - reinstalled Ubuntu from scratch - updated BIOS to the latest version - added pcie_aspm=off kernel option Does this description may help you to me on the right track? What else can I do to find out the reason of the hangs? Where and what should I look for in the log files? UPDATE -- Thanks to user Artem S. Tashkinov, I discovered that during those hangs the machine is still alive and accepts SSH connections. dmesg clearly says the GPU is the culprit: enter image description here Here I read it seems a bug of nvidia, since - like the user - 1. it happens no matter what I'm doing, even with no activity at all (hence no thermal/ps cause) 2. after the reboot it works fine the whole day 3. in Windows there are no issues at all. Have I to live with it? Or is there a way to fix?
Asked by Mark (815 rep)
May 25, 2023, 09:28 AM
Last activity: May 26, 2023, 05:02 PM