Sample Header Ad - 728x90

Dell 3620 only boots with nolapic kernel parameter since upgrade to 6.8.12

0 votes
0 answers
54 views
I have a 3-node proxmox cluster running at home with Dell Precision 3620 computers, all interconnected by:
05:00.0 Ethernet controller: Mellanox Technologies MT27520 Family [ConnectX-3 Pro]
        Subsystem: Hewlett-Packard Company Ethernet 10G 2-port 546SFP+ Adapter
During the upgrade to the most recent version of proxmox, the Linux kernel was upgraded from 6.5.13-6-pve to 6.8.12-1-pve. The 6.8.12-1-pve was stuck after initializing the USB devices. The latest line of the boot is:
[    1.948124] hid-generic 0003:046A:0001.0002: input,hidraw1: USB HID v1.11 Keyboard [Cherry GmbH] on usb-0000:00:14.0-8/input0
It only continues booting when I add "nolapic" as a kernel command line parameter, which then of course, disables SMP and I can only use one CPU / core. Interestingly, the next step (which never comes to execution) would have been the initialization of the Mellanox NIC:
[    7.901026] mlx4_core 0000:05:00.0: DMFS high rate steer mode is: disabled performance optimized steering
[    7.901402] mlx4_core 0000:05:00.0: 31.504 Gb/s available PCIe bandwidth, limited by 8.0 GT/s PCIe x4 link at 0000:00:1c.4 (capable of 63.008 Gb/s with 8.0 GT/s PCIe x8 link)
Please find the whole dmesg (of the "nolapic" boot) here: https://pastebin.com/raw/uJb15njQ I also tried to upgrade the BIOS (the version 2.8.1 is installed but 2.32.0 is most recent). Upgrading the BIOS makes the new kernel bootable without the nolapic command line parameter, but the Mellanox NIC unusable (independant of which kernel I use, all show the same error):
[    0.324965] pci 0000:05:00.0: [15b3:1007] type 00 class 0x020000 PCIe Endpoint
[    0.325432] pci 0000:05:00.0: BAR 0 [mem 0xef100000-0xef1fffff 64bit]
[    0.325748] pci 0000:05:00.0: BAR 2 [mem 0xe0000000-0xe1ffffff 64bit pref]
[    0.326284] pci 0000:05:00.0: ROM [mem 0xef000000-0xef0fffff pref]
[    0.328610] pci 0000:05:00.0: VF BAR 2 [mem 0x00000000-0x01ffffff 64bit pref]
[    0.328693] pci 0000:05:00.0: VF BAR 2 [mem 0x00000000-0x1fffffff 64bit pref]: contains BAR 2 for 16 VFs
[    0.331768] pci 0000:05:00.0: 31.504 Gb/s available PCIe bandwidth, limited by 8.0 GT/s PCIe x4 link at 0000:00:1c.4 (capable of 63.008 Gb/s with 8.0 GT/s PCIe x8 link)
[    0.388844] pci 0000:05:00.0: VF BAR 2 [mem size 0x20000000 64bit pref]: can't assign; no space
[    0.388942] pci 0000:05:00.0: VF BAR 2 [mem size 0x20000000 64bit pref]: failed to assign
[    0.390239] pci 0000:05:00.0: BAR 2 [mem size 0x02000000 64bit pref]: can't assign; no space
[    0.390336] pci 0000:05:00.0: BAR 2 [mem size 0x02000000 64bit pref]: failed to assign
[    0.390432] pci 0000:05:00.0: VF BAR 2 [mem size 0x20000000 64bit pref]: can't assign; no space
[    0.390530] pci 0000:05:00.0: VF BAR 2 [mem size 0x20000000 64bit pref]: failed to assign
[    0.392703] pci 0000:05:00.0: BAR 2 [mem size 0x02000000 64bit pref]: can't assign; no space
[    0.392800] pci 0000:05:00.0: BAR 2 [mem size 0x02000000 64bit pref]: failed to assign
[    0.392897] pci 0000:05:00.0: VF BAR 2 [mem size 0x20000000 64bit pref]: can't assign; no space
[    0.392995] pci 0000:05:00.0: VF BAR 2 [mem size 0x20000000 64bit pref]: failed to assign
[    0.393092] pci 0000:05:00.0: BAR 0 [mem 0xe0100000-0xe01fffff 64bit]: assigned
[    0.393345] pci 0000:05:00.0: ROM [mem 0xe0200000-0xe02fffff pref]: assigned
[    0.393429] pci 0000:05:00.0: BAR 2 [mem size 0x02000000 64bit pref]: can't assign; no space
[    0.393526] pci 0000:05:00.0: BAR 2 [mem size 0x02000000 64bit pref]: failed to assign
[    0.394199] pci 0000:05:00.0: BAR 0 [mem 0xe0100000-0xe01fffff 64bit]: assigned
[    0.394447] pci 0000:05:00.0: ROM [mem 0xe0200000-0xe02fffff pref]: assigned
[    0.394530] pci 0000:05:00.0: VF BAR 2 [mem size 0x20000000 64bit pref]: can't assign; no space
[    0.394628] pci 0000:05:00.0: VF BAR 2 [mem size 0x20000000 64bit pref]: failed to assign
[    0.400419] pci 0000:05:00.0: Adding to iommu group 13
[    0.405601] DMAR: [DMA Read NO_PASID] Request device [05:00.0] fault addr 0xc4da3000 [fault reason 0x06] PTE Read access is not set
[    0.406462] DMAR: [DMA Read NO_PASID] Request device [05:00.0] fault addr 0xc4d21000 [fault reason 0x06] PTE Read access is not set
[    0.991623] mlx4_core: Initializing 0000:05:00.0
[    0.991831] mlx4_core 0000:05:00.0: Missing UAR, aborting
I spent hours and hours trying different kernel parameters (especially related to APIC / ACPI) first with the most recent BIOS to make the NIC usable again, then with the older BIOS to make the system boot without "nolapic" but without any success. For example (all after each other): - pci=nomsi - pci=noioapicquirk - pci=ioapicreroute - pci=noioapicreroute - pci=noacpi - pci=use_crs - pci=nocrs - pci=routeirq - pci=realloc=off - pci=noari - pci=noats - clocksource=hpet - irqpoll - noacpi - noapic - irqfixup - iommu=off The firmware on the Mellanox NIC is the most recent one. I'm thankful for any more ideas / comments. Thanks, Freddy
Asked by Freddy (1 rep)
Sep 15, 2024, 10:23 PM