Unix & Linux Stack Exchange

Q&A for users of Linux, FreeBSD and other Unix-like operating systems

Latest Questions

1 votes

0 answers

56 views

What happend if preempt_enable() inside an nmi?

https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-8/-/blob/ccf40dfacd314ab0fea16cfc6f4eded1a08e2710/arch/x86/kernel/cpu/mce/core.c#L1331 `preempt_enable()` before `nmi_exit()`, is this a OK code? ``` if ((m.cs & 3) == 3) { /* If this triggers there is no way to recover. Die hard. */ B...

if ((m.cs & 3) == 3) {
        /* If this triggers there is no way to recover. Die hard. */
        BUG_ON(!on_thread_stack() || !user_mode(regs));
        local_irq_enable();
        preempt_enable();

        current->task_struct_rh->mce_ripv = !!(m.mcgstatus & MCG_STATUS_RIPV);
        current->task_struct_rh->mce_whole_page = whole_page(&m);

        if (kill_it || do_memory_failure(&m))
            force_sig(SIGBUS, current);
        preempt_disable();
        local_irq_disable();
    } else {
        if (!fixup_exception(regs, X86_TRAP_MC))
            mce_panic("Failed kernel mode recovery", &m, NULL);
    }

out_ist:
    nmi_exit();
}

In my understanding, the kernel can't preempt if preempt_count > 0, in this case, it is inside NMI, so preempt_count > 0. It is sending SIGBUS:

force_sig(SIGBUS, current);

seems unnecessary to do local_irq_enable and preempt_enable?

Mark K (955 rep)

Sep 5, 2022, 06:22 AM • Last activity: Sep 5, 2022, 06:29 AM

11 votes

5 answers

5939 views

Unknown NMI reason 20 and 30 on a VM

linux kvm libvirt nmi

I pulled up the console on a virtual machine I manage today and was greeted with some kernel messages: [5912557.130943] Uhhuh. NMI received for unknown reason 20 on CPU 0. [5912557.131115] Do you have a strange power saving mode enabled? [5912557.131287] Dazed and confused, but trying to continue [6...

                                  I pulled up the console on a virtual machine I manage today and was greeted with some kernel messages:

    [5912557.130943] Uhhuh. NMI received for unknown reason 20 on CPU 0.
    [5912557.131115] Do you have a strange power saving mode enabled?
    [5912557.131287] Dazed and confused, but trying to continue
    [6064281.393568] Uhhuh. NMI received for unknown reason 30 on CPU 1.
    [6064281.393888] Do you have a strange power saving mode enabled?
    [6064281.394235] Dazed and confused, but trying to continue

That's just a few of them, both 20 and 30 occur on CPU 0 and 1.

- VM is Debian Jessie, BIOS boot ("QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.3-20161025_171302-gandalf 04/01/2014"; kernel 3.16.0-4-amd64)
- Hypervisor is libvirt/KVM running on Debian testing (currently Debian's 4.7.0-1-amd64; qemu 1:2.7+dfsg-3).
- Hardware is an Opteron 6344 on a [Supermicro H8SGL-F](http://www.supermicro.com/Aplus/motherboard/Opteron6000/SR56x0/H8SGL-F.cfm)  with ECC RAM with scrub enabled.

I don't see any NMI or EDAC error/warning messages on the host.

Any idea what is causing these NMI messages on the guest? Are they anything to worry about?

(May be related to https://unix.stackexchange.com/questions/216925/nmi-received-for-unknown-reason-20-do-you-have-a-strange-power-saving-mode-ena  but that appears to be bare metal).

                                

derobert (112979 rep)

Nov 30, 2016, 10:32 PM • Last activity: Jul 18, 2022, 12:14 PM

1 votes

1 answers

1542 views

Can't disable NMI watchdog on Debian Buster in vmware context - couldn't write to kernel, unknown error 524

debian boot virtual-machine nmi

Following [instructions][1], I want to disable the NMI watchdog on boot. sudo sh -c "echo '0' > /proc/sys/kernel/nmi_watchdog" However: Couldn't write '0' to 'kernel/nmi_watchdog': Unknown error 524 How to proceed? [1]: https://www.pcsuggest.com/disable-nmi-watchdog-linux/

                                  Following instructions , I want to disable the NMI watchdog on boot.

    sudo sh -c "echo '0' > /proc/sys/kernel/nmi_watchdog"

However:

    Couldn't write '0' to 'kernel/nmi_watchdog': Unknown error 524

How to proceed?

J. Doe (177 rep)

Aug 20, 2019, 12:41 PM • Last activity: Jul 13, 2020, 09:42 AM

1 votes

0 answers

600 views

Custom interrupt handler for the NMI hardware button

linux-kernel drivers interrupt irq nmi

I'm trying to create a custom interrupt handler for the NMI hardware button which exists on my motherboard. To test this functionality I've created this simple module: #include #include #include #include static int nmi_custom_handler(unsigned int val, struct pt_regs* regs) { pr_info("My custom NMI:...

                                  I'm trying to create a custom interrupt handler for the NMI hardware button which exists on my motherboard.

To test this functionality I've created this simple module:

    #include 
    #include 
    #include 
    #include 
    
    static int nmi_custom_handler(unsigned int val, struct pt_regs* regs)
    {
        pr_info("My custom NMI: 0x%x\n", val);
        return NMI_HANDLED;
    }
    
    static int __init nmi_handler_init(void) {
        pr_info("nmi_handler_init\n");
        register_nmi_handler(NMI_UNKNOWN, nmi_custom_handler, 0, "my_custom_nmi");
        return 0;
    }
    
    static void __exit nmi_handler_exit(void) {
        pr_info("nmi_handler_exit\n");
        unregister_nmi_handler(NMI_UNKNOWN, "my_custom_nmi");
    }
    
    module_init(nmi_handler_init);
    module_exit(nmi_handler_exit);
    
    MODULE_AUTHOR("Konstantin Aladyshev ");
    MODULE_LICENSE("GPL");

If I load this module and press NMI button one time, there will be "My custom NMI" message for every CPU core in my system. The same can be seen in the "/proc/interrupt" interface. NMI interrupt count increases from 0 to 1 for every CPU.
But for some reason this works only once. Next button presses don't get logged by my module or the /proc interface.

Why? What should I change to be able to use NMI hardware interrupt again?


                                

kostr22 (216 rep)

Jan 14, 2020, 06:28 PM

0 votes

1 answers

1623 views

How to have the kernel print a stacktrace when sending a Hardware NMI

linux hardware nmi

I have Qemu VMs running FreeBSD, Windows and Linux, and I can send them a hardware NMI via the Qemu monitor. ``` qm monitor 100 Entering Qemu Monitor for VM 100 - type 'help' for help qm> help nmi nmi -- inject an NMI ``` When inkecting the NMI to a Windows VM, I get the message is saving a crash du...

I have Qemu VMs running FreeBSD, Windows and Linux, and I can send them a hardware NMI via the Qemu monitor.

qm monitor 100
Entering Qemu Monitor for VM 100 - type 'help' for help
qm> help nmi
nmi  -- inject an NMI

When inkecting the NMI to a Windows VM, I get the message is saving a crash dump, and this reboot the VM. On Linux I get the message

[26731.911302] Uhhuh. NMI received for unknown reason 31 on CPU 0.
[26731.911303] Do you have a strange power saving mode enabled?
[26731.911304] Dazed and confused, but trying to continue

How to get the kernel to print a stack trace on the console instead of only this message ? I would need this to debug VMs hanging because of very slow IO.

Manu (576 rep)

Oct 23, 2017, 02:25 PM • Last activity: Oct 23, 2017, 02:28 PM

6 votes

1 answers

2096 views

What does ACPI NMI LINT mean? and Why it changes across kernel version?

linux kernel linux-kernel acpi nmi

I'd like to understand what the following lines mean [ 0.000000] ACPI: LAPIC_NMI (acpi_id[0x01] dfl dfl lint[0x0]) [ 0.000000] ACPI: NMI not connected to LINT 1! [ 0.000000] ACPI: LAPIC_NMI (acpi_id[0x02] dfl dfl lint[0x0]) [ 0.000000] ACPI: NMI not connected to LINT 1! [ 0.000000] ACPI: LAPIC_NMI (...

                                  I'd like to understand what the following lines mean

    [    0.000000] ACPI: LAPIC_NMI (acpi_id[0x01] dfl dfl lint[0x0])
    [    0.000000] ACPI: NMI not connected to LINT 1!
    [    0.000000] ACPI: LAPIC_NMI (acpi_id[0x02] dfl dfl lint[0x0])
    [    0.000000] ACPI: NMI not connected to LINT 1!
    [    0.000000] ACPI: LAPIC_NMI (acpi_id[0x03] dfl dfl lint[0x0])
    [    0.000000] ACPI: NMI not connected to LINT 1!
    [    0.000000] ACPI: LAPIC_NMI (acpi_id[0x04] dfl res lint[0x6f])
    [    0.000000] ACPI: NMI not connected to LINT 1!

And why the value of the second to last line changes across the kernel version?  

e.g.:  
with kernel 4.9.3 it's [0x6f]  
with kernel 4.7.8 it's [0x1f]  
and so on
                                

mattia.b89 (3398 rep)

Jan 20, 2017, 12:01 PM • Last activity: May 26, 2017, 10:12 PM

11 votes

1 answers

23592 views

Should I disable NMI watchdog permanently or not?

linux-kernel watchdog nmi

Why we need to keep the nmi_watchdog enabled and what could happen if I disable it permanently ? As some applications recommends to disable NMI watchdog to work properly, what's the advantage of disabling it ? And what does the results of this command, `grep -i nmi /proc/interrupts` mean ? NMI: 24 1...

                                  Why we need to keep the nmi_watchdog enabled and what could happen if I disable it permanently ?

As some applications recommends to disable NMI watchdog to work properly, what's the advantage of disabling it ?

And what does the results of this command, grep -i nmi /proc/interrupts  mean ?

    NMI:         24         18         21         18   Non-maskable interrupts

Arnab (1691 rep)

Mar 26, 2017, 04:16 AM • Last activity: Mar 26, 2017, 07:54 AM

Showing page 1 of 7 total questions