Unix & Linux Stack Exchange

Q&A for users of Linux, FreeBSD and other Unix-like operating systems

Latest Questions

7 votes

1 answers

4888 views

Very High CPU Usage By IRQ #16

I recently noticed that one of my CPUs was idling at around 85-90% and according to `top` the usage was coming from interrupts, so [just like in this question][1] I used a combination of `dmesg` and periodically `cat`ing `/proc/interrupts` and found out this: CPU0 CPU1 CPU2 CPU3 0: 17 0 0 0 IR-IO-AP...

                                  I recently noticed that one of my CPUs was idling at around 85-90% and according to top the usage was coming from interrupts, so just like in this question  I used a combination of dmesg and periodically cating /proc/interrupts and found out this:

                CPU0       CPU1       CPU2       CPU3       
       0:         17          0          0          0  IR-IO-APIC    2-edge      timer
       1:      11548          0       2429          0  IR-IO-APIC    1-edge      i8042
       8:          0          0          0          1  IR-IO-APIC    8-edge      rtc0
       9:          7         16          0          0  IR-IO-APIC    9-fasteoi   acpi
      12:      14530     108887          0          0  IR-IO-APIC   12-edge      i8042
      16:   78464100          0          0   11702812  IR-IO-APIC   16-fasteoi   idma64.0, i2c_designware.0, i801_smbus
     120:          0          0          0          0  DMAR-MSI    0-edge      dmar0
     121:          0          0          0          0  DMAR-MSI    1-edge      dmar1

As you can see, IRQ #16 is sending interrupts like crazy (every time the CPU wakes up from S3 it seems to start spamming a different CPU), I also found out that my touchpad uses the same IRQ and if the I2C mode is enabled (or *advanced* mode, according to my BIOS), it randomly stops working with the following messages (from dmesg):

    [  167.851139] irq 16: nobody cared (try booting with the "irqpoll" option)
    [  167.851158] CPU: 2 PID: 3874 Comm: firefox Not tainted 4.15.3-300.fc27.x86_64 #1
    [  167.851160] Hardware name: Acer Aspire E5-575/Ironman_SK  , BIOS V1.04 04/26/2016
    [  167.851162] Call Trace:
    [  167.851171]  
    [  167.851185]  dump_stack+0x5c/0x85
    [  167.851193]  __report_bad_irq+0x30/0xc0
    [  167.851196]  note_interrupt+0x235/0x280
    [  167.851198]  handle_irq_event_percpu+0x51/0x70
    [  167.851201]  handle_irq_event+0x27/0x50
    [  167.851204]  handle_fasteoi_irq+0x6b/0x120
    [  167.851209]  handle_irq+0xaf/0x120
    [  167.851214]  do_IRQ+0x41/0xc0
    [  167.851219]  common_interrupt+0xa2/0xa2
    [  167.851222]  
    [  167.851224] RIP: 0010:_raw_spin_lock+0x10/0x20
    [  167.851226] RSP: 0000:ffffa85a857dfdd0 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffdb
    [  167.851230] RAX: 0000000000000000 RBX: ffff8d0a268930c8 RCX: 00003ffffffff000
    [  167.851231] RDX: 0000000000000001 RSI: 8000000000000025 RDI: ffffd21648d7ca70
    [  167.851232] RBP: ffffd2164892e100 R08: 0000000000000000 R09: 0000000000171800
    [  167.851233] R10: 0000000000271800 R11: 0000000000001000 R12: 0000000000000000
    [  167.851234] R13: 8000000224b84867 R14: ffffd21648d7ca70 R15: ffff8d0a35f29810
    [  167.851244]  __handle_mm_fault+0xa4c/0x1290
    [  167.851249]  handle_mm_fault+0xaa/0x1f0
    [  167.851255]  __do_page_fault+0x25d/0x4e0
    [  167.851262]  ? SyS_mmap_pgoff+0xfb/0x250
    [  167.851264]  do_page_fault+0x32/0x110
    [  167.851267]  ? page_fault+0x36/0x60
    [  167.851269]  page_fault+0x4c/0x60
    [  167.851272] RIP: 0033:0x7ff86dc0b205
    [  167.851273] RSP: 002b:00007ffe6493e888 EFLAGS: 00010206
    [  167.851276] handlers:
    [  167.851291] [] idma64_irq [idma64]
    [  167.851296] [] i2c_dw_isr
    [  167.851302] [] i801_isr [i2c_i801]
    [  167.851304] Disabling IRQ #16

Is this a hardware issue? What can I do?


----------


Finally I have a chance to dig more into this, by running lspci -nnkv I found out 2 devices that are using IRQ 16:

    00:15.0 Signal processing controller : Intel Corporation Sunrise Point-LP Serial IO I2C Controller #0 [8086:9d60] (rev 21)
            Subsystem: Acer Incorporated [ALI] Device [1025:1094]
            Flags: fast devsel, IRQ 16
            Memory at a132b000 (64-bit, non-prefetchable) [size=4K]
            Capabilities:  Power Management version 3
            Capabilities:  Vendor Specific Information: Len=14 
            Kernel driver in use: intel-lpss
            Kernel modules: intel_lpss_pci

and:

    00:1f.4 SMBus [0c05]: Intel Corporation Sunrise Point-LP SMBus [8086:9d23] (rev 21)
            Subsystem: Acer Incorporated [ALI] Device [1025:1094]
            Flags: medium devsel, IRQ 16
            Memory at a132e000 (64-bit, non-prefetchable) [size=256]
            I/O ports at 4040 [size=32]
            Kernel driver in use: i801_smbus
            Kernel modules: i2c_i801

The problem seems to go away if I unload the intel_lpss_pci module i.e. rmmod intel_lpss_pci, but of course the touchpad would stop working. But I guess it's better than having a CPU always at 100%.
                                

arielnmz (559 rep)

Mar 1, 2018, 05:25 AM • Last activity: Jun 26, 2025, 02:05 PM

3 votes

1 answers

1988 views

Remove the new kernel

linux-mint kernel cpu-usage

I was working without any problem and one day the computer didn't start he said that there is a problem in the Xserver , so I connected to my computer through ssh and reinstalled xserver-xorg ,xserver-xorg-core then The computer start working but when I tap the touchpad it's don't click (it's not a...

                                  I was working without any problem and one day the computer didn't start he said that there is a problem in the Xserver , so I connected to my computer through ssh and reinstalled xserver-xorg ,xserver-xorg-core then The computer start working but when I tap the touchpad it's don't click (it's not a configuration problem) so i copied the drivers from another linux mint and copied them into mine , and in that moment i found that i have the kernel 4.2.0-32 installed but it's not in my friend computer then I tried to uninstall it but it was impossible then I've installed the kernel 4.4.0-22 now the the touchpad is working but youtube in chrome show black video , I've changed something in the configuration about harware acceleration and now it's fine now the computer work without any problem except it's working so slowly I'm just using a browser and i'm using 60% from my CPU 

**so I want to go back to the kernel 3.19.0-32 because it's recommended and the problem is that I can't uninstall the current kernel because it's loaded so i want to know how can i load another installed kernel to be able to delete the current one ? (Current kernel 4.4.0-22)**

*Here i can delete the old kernel because it's not loaded*

*Here i can't delete the new kernel because it's loaded (I want to delete it)*

Khalil Bz (153 rep)

May 8, 2016, 11:00 AM • Last activity: Jun 23, 2025, 04:03 PM

2 votes

1 answers

3240 views

How can I get the total CPU usage of a Linux machine with 1 or n CPU cores?

cpu-usage

I am currently using the below method to extract CPU usage idle value from top command and subtracting the value from 100. Is this method correct and is there a better way to achieve the same. Also, my linux VM is a stripped down version and has only few basic tools like `top`. Installing other tool...

                                  I am currently using the below method to extract CPU usage idle value from top command and subtracting the value from 100. Is this method correct and is there a better way to achieve the same.

Also, my linux VM is a stripped down version and has only few basic tools like top. Installing other tools is not an option as the package manager is also removed. 

    CPU_IDLE="$(top -bn2 | grep -F '%Cpu' | tail -n 4 | gawk '{print $8 $9}' | tr -s '\n\:\,[:alpha:]' ' '| gawk '{print $2}'),"

Bandi Sandeep (21 rep)

Apr 12, 2017, 06:07 AM • Last activity: May 28, 2025, 12:05 PM

0 votes

0 answers

111 views

MySQL Server keeps hitting 100% CPU

debian mysql reboot freeze cpu-usage

My server has 800+ days uptime, but ever since a couple weeks the **MySQL server** keeps hitting **100% CPU** and I have to restart the `mysqld` service. Where do I start to find out where this issue is coming from? As all the sites on my server keep freezing until I restart. I don't want to keep re...

                                  My server has 800+ days uptime, but ever since a couple weeks the **MySQL server** keeps hitting **100% CPU** and I have to restart the mysqld service.

Where do I start to find out where this issue is coming from? As all the sites on my server keep freezing until I restart. I don't want to keep restarting the service every time. Sometimes it is after a couple minutes, sometimes a couple hours, sometimes a couple days.

Nothing changed really. I only once updated the my.cnf file for the default character set and restarted recently, but I don't believe this is linked in any way.

**--- Server details ---**

Debian 10 Buster

MySQL Version: 14.14

Average users: 150 - 200 per day

vCPU's: 4

RAM: 4GB


**my.cnf**

No slow queries shown so far.

    [mysqld]
    collation_server = utf8mb4_unicode_ci
    character_set_server = utf8mb4
    sql-mode="NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION"
    slow_query_log = 1
    slow_query_log_file = /var/log/mysql/slow_queries.log
    log_queries_not_using_indexes = 'OFF'
    long_query_time = 5
                                

Z0q (631 rep)

May 31, 2024, 03:56 PM • Last activity: May 26, 2025, 05:19 AM

2 votes

1 answers

1940 views

Short periodic freezes every few seconds. Everything except the mouse

gnome-shell cpu-usage

My computer freezes every 1 or 2 seconds for a short period as well. So 1 or 2 seconds working and 1 or 2 seconds not working. Everything stops working except for the mouse. ---------- The first time I discovered the problem was when I wanted to open a 1GB txt file with leafpad. The syslog (and othe...

                                  My computer freezes every 1 or 2 seconds for a short period as well. So 1 or 2 seconds working and 1 or 2 seconds not working. 

Everything stops working except for the mouse.

----------
The first time I discovered the problem was when I wanted to open a 1GB txt file with leafpad. The syslog (and other files) raised to 350MB with leafpad errors. I still don't really think it could be the cause but since then I have noticed it slow.

I tried deleting those lines to make the files lighter but didn't work (ofc).

The line was a repetition of:

    localhost leafpad: pango_tab_array_get_tab: assertion 'tab_index >= 0' failed

----------
**Gnome-shell debugging** (In the end I think the problem is not there)

I have runned top to see the problem and my first guess was that gnome-shell.
I have disabled all extensions on gnome and I have put Hidden=True on the gnome tracker. Reboot ofc but issue still continues.

    top - 11:37:47 up 16 min,  1 user,  load average: 5.08, 4.53, 3.07
    Tasks: 186 total,   1 running, 185 sleeping,   0 stopped,   0 zombie
    %Cpu(s):  5.4 us, 13.6 sy,  0.0 ni, 78.8 id,  2.2 wa,  0.0 hi,  0.0 si,  0.0 st
    MiB Mem :  11894.0 total,   9255.9 free,    884.7 used,   1753.4 buff/cache
    MiB Swap:      0.0 total,      0.0 free,      0.0 used.  10597.3 avail Mem 
    
      PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND    
     1467 root      20   0 3828560 258004  73056 S  16.9   2.1   3:00.91 gnome-she+ 
     1627 root      20   0  384600  23668  17328 S  13.0   0.2   1:31.00 gsd-xsett+ 
     1732 root      20   0 1190848  66960  31648 S  11.6   0.5   1:21.92 gnome-sof+ 
     2371 root      20   0  239576  28532  22080 S   9.0   0.2   0:49.61 leafpad    
     2282 root      20   0 1397692  79500  38488 S   8.3   0.7   2:27.84 nautilus   
     1618 root      20   0  452484  40448  13752 S   7.6   0.3   1:01.97 packageki+ 
     1643 root      20   0  384156  24452  17428 S   5.3   0.2   1:16.62 gsd-keybo+ 
     1636 root      20   0  236512  22152  17128 S   3.0   0.2   1:16.76 gsd-clipb+ 
     1269 root      20   0  343084  47552  32060 S   0.7   0.4   0:19.31 Xorg       
        9 root      20   0       0      0      0 I   0.3   0.0   0:01.07 rcu_sched  
     1176 message+  20   0   18272   5276   3476 S   0.3   0.0   0:01.51 dbus-daem+ 
     1640 root      20   0  550896  24776  19364 S   0.3   0.2   1:18.79 gsd-color  
     2850 root      20   0  527664  39564  28252 S   0.3   0.3   0:07.43 gnome-ter+ 
     3048 root      20   0   15804   3484   3040 R   0.3   0.0   0:00.01 top        
        1 root      20   0  192548   9036   6632 S   0.0   0.1   0:02.95 systemd    
        2 root      20   0       0      0      0 S   0.0   0.0   0:00.00 kthreadd   
        3 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 rcu_gp 

I have used the following to know where the issue was and it seeams openat takes the majority. What also get's my attention is the amount of errors that function gets and I guess that might be the problem.

    strace -c -p 1467
    strace: Process 1467 attached
    ^Cstrace: Process 1467 detached
    % time     seconds  usecs/call     calls    errors syscall
    ------ ----------- ----------- --------- --------- ----------------
     38.35    2.269925          65     34909     22415 openat
     21.63    1.280485        1583       809       252 unlink
     18.82    1.113966        4700       237        15 link
     16.79    0.993957        4498       221           rename
      0.96    0.056549           2     30633     21313 access
      0.91    0.053897           3     20006       186 stat
      0.47    0.027686           1     19059           read
      0.42    0.024586           2     12498           close
      0.33    0.019538           2     10852           fstat
      0.28    0.016418           5      3083           munmap
      0.21    0.012386           4      3099           mmap
      0.18    0.010921          21       528           write
      0.13    0.007561           1      7413           getuid

So I killed the gnome-shell process and still the problem remains. I don't really see what the problem may be and I have a 4 cores intel i7 processor pc so it shouldn't be so demanded.


----------

    iostat -h
    Linux  10/11/2018 	_x86_64_	(4 CPU)
    
    avg-cpu:  %user   %nice %system %iowait  %steal   %idle
              24.4%    0.4%   21.4%   10.8%    0.0%   43.0%
    
          tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn Device
        11.14       175.3k         0.0k      16.6M       0.0k sda
       235.07         6.6M         2.9M     641.9M     286.4M sdb
       769.76       789.5k         0.0k      75.0M       0.0k loop0


                                

Agustin Barrachina (241 rep)

Oct 8, 2018, 09:51 AM • Last activity: May 13, 2025, 10:03 PM

2 votes

1 answers

10085 views

CPU reservation and affinity using taskset and isolcpus kernel parameter with JVM?

cpu-usage taskset

We need for the JVM to reserve a set number of CPUs. Following my research we can use `taskset` along with the kernel parameter `isolcpus= ` so that no other process uses this CPU. A few questions arise: - does the process need to be started with `taskset`? - does the reservation means that the proc...

                                  We need for the JVM to reserve a set number of CPUs. Following my research we can use taskset along with the kernel parameter isolcpus= so that no other process uses this CPU. 

A few questions arise:

- does the process need to be started with taskset?
- does the reservation means that the process can only run on that CPU and if there are resources problems it can expand to the other CPUs?

danidar (201 rep)

Jul 26, 2018, 03:52 PM • Last activity: Apr 30, 2025, 09:06 PM

1 votes

1 answers

128 views

How to measure actual CPU utilization in Linux for multi core applications?

cpu top cpu-usage multithreading hyperthreading

I have a computation intensive process that I need to run multiple times on a multi-core processor but "top" isn't showing utilization or load in a useful way. For example, imagine my task runs in 1 minute in a single thread on a single core of my six core, 12 thread, SMT CPU. If I start the same task six times using six threads, it still finishes in 1 minute and top shows the load average as 6.0 and the cpu(s) at 50% us and 50% id. In the top process list, each of the six processes is showing 100% CPU. If I do the same thing but start 12 threads, it finishes the 12 jobs in 2 minutes and top shows the load average as 12.0, cpu(s) at 100% us 0% id, with 12 processes each at 100% CPU. Now, the 6 thread and 12 thread examples are both processing at the same fully loaded rate of completing 1/6 job per minute but why does top show the 6-thread case being 50% idle when clearly it isn't? Is there a better way of determining the actual load of the CPUs? This was run on a Ryzen 5600X processor on Ubuntu 24.12. Edit: top output for 12 tasks:

top - 08:35:37 up 54 days, 20:49,  3 users,  load average: 12.20, 6.70, 2.80
Tasks: 346 total,  13 running, 332 sleeping,   0 stopped,   1 zombie
%Cpu(s): 98.2 us,  1.7 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.1 si,  0.0 st 
MiB Mem :  64221.7 total,   1572.7 free,   4983.4 used,  58684.1 buff/cache     
MiB Swap:   8192.0 total,   7863.7 free,    328.3 used.  59238.3 avail Mem 

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                
2249765 user    20   0  126952  64132  51200 R 100.0   0.1   3:48.87 sonicLiquidFoam                                        
2249759 user    20   0  127060  64220  51200 R 100.0   0.1   3:48.93 sonicLiquidFoam                                        
2249757 user    20   0  126624  64064  51328 R 100.0   0.1   3:49.32 sonicLiquidFoam                                        
2249761 user    20   0  128276  64868  50688 R 100.0   0.1   3:47.65 sonicLiquidFoam                                        
2249762 user    20   0  127652  63688  50432 R 100.0   0.1   3:49.13 sonicLiquidFoam                                        
2249755 user    20   0  128844  66128  51200 R 100.0   0.1   3:46.06 sonicLiquidFoam                                        
2249766 user    20   0  126576  63952  51328 R 100.0   0.1   3:47.87 sonicLiquidFoam                                        
2249764 user    20   0  126612  63824  51072 R  99.0   0.1   3:48.59 sonicLiquidFoam                                        
2249760 user    20   0  126888  63972  51072 R  98.7   0.1   3:45.06 sonicLiquidFoam                                        
2249758 user    20   0  127500  64860  51200 R  97.7   0.1   3:48.64 sonicLiquidFoam                                        
2249763 user    20   0  127916  64944  51072 R  97.0   0.1   3:39.58 sonicLiquidFoam                                        
2249756 user    20   0  126828  63948  51072 R  96.0   0.1   3:48.77 sonicLiquidFoam

For 6 tasks:

top - 08:40:22 up 54 days, 20:53,  3 users,  load average: 6.11, 6.67, 3.90
Tasks: 335 total,   7 running, 327 sleeping,   0 stopped,   1 zombie
%Cpu(s): 50.0 us,  1.0 sy,  0.0 ni, 49.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st 
MiB Mem :  64221.7 total,   1616.2 free,   4914.6 used,  58710.3 buff/cache     
MiB Swap:   8192.0 total,   7863.7 free,    328.3 used.  59307.1 avail Mem 

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                
2250032 user    20   0  127392  64676  51200 R 100.0   0.1   2:39.15 sonicLiquidFoam                                        
2250027 user    20   0  126828  63096  50176 R 100.0   0.1   2:39.23 sonicLiquidFoam                                        
2250028 user    20   0  127060  63260  50176 R 100.0   0.1   2:39.23 sonicLiquidFoam                                        
2250029 user    20   0  128844  66124  51200 R 100.0   0.1   2:39.12 sonicLiquidFoam                                        
2250030 user    20   0  128276  65508  51200 R 100.0   0.1   2:39.21 sonicLiquidFoam                                        
2250031 user    20   0  126596  63808  51072 R 100.0   0.1   2:39.21 sonicLiquidFoam

tkw954 (113 rep)

Apr 23, 2025, 07:35 PM • Last activity: Apr 24, 2025, 02:43 PM

0 votes

0 answers

33 views

CPU affinity not following cpuset

linux-kernel cpu-usage taskset

When I run `taskset -p ` of a process I am getting something like this back: ``` # taskset -p 1078 pid 1078's current affinity mask: 3f ``` And it keeps changing what it reports, sometimes it's 5f, other times df and so on. For the same process I can see that its allowed on all cores: ``` # cat /pro...

When I run taskset -p of a process I am getting something like this back:

# taskset -p 1078
pid 1078's current affinity mask: 3f

And it keeps changing what it reports, sometimes it's 5f, other times df and so on. For the same process I can see that its allowed on all cores:

# cat /proc/1078/status | grep Cpus
Cpus_allowed:   ff
Cpus_allowed_list:      0-7

And its cpuset in cgroups also allows 0-7:

# cat /dev/cpuset/cpus 
0-7

If I try to set it to ff using taskset I still do not get ff back:

# taskset -p ff 1078
pid 1078's current affinity mask: 5f
pid 1078's new affinity mask: 5f

What mechanism is overriding the cpuset and taskset affinity? Any way I can force it to actually run on all cores? This is on Android 13 and kernel 5.15.

Zitrax (284 rep)

Apr 16, 2025, 11:47 AM

2 votes

1 answers

3009 views

High CPU Usage from systemd-udevd

systemd udev cpu-usage

I have a dell studio 1569 and just installed linux onto it. I noticed that the cpu has been running high due to systemd-udevd. Going though different posts on the web including [this one,][1] I used "udevadm monitor" to help narrow down what was happening, and here is the output: [![udevadm monitor...

                                  I have a dell studio 1569 and just installed linux onto it.  I noticed that the cpu has been running high due to systemd-udevd.  Going though different posts on the web including this one,  I used "udevadm monitor" to help narrow down what was happening, and here is the output:

I first assumed a usb, so I plugged in and unplugged from all ports but soon discovered it did not have the same path as /devices/pci0000:00/0000:00:1d.0/usb2/2-1/2-1.6/2-1.6.2/2-1.6.2:1.0 (usb), then long story short, while I had udevadm monitor running, I pushed some keys on my keyboard and noticed that the path was the same for the keyboard (as seen in the picture above), only difference was the beginning of the line had aKERNEL in front of it instead of KERNEL or UDEV.  

My next test was while I had udevadm monitor running, I took apart my laptop and disconnected the keyboard to see if those bind/unbind entrys would stop.  But they continued, which makes me now think this is not the keyboard. Can someone know of what else it could be if it is not the keyboard?

Here is the output from lsusb -t:

EDIT:
In case anyone else is running into an issue similar to mine, disabling the bluetooth in the BIOS seems to fix the issue.  Refer to this post.

Kayracer (31 rep)

Jun 22, 2019, 02:43 AM • Last activity: Apr 14, 2025, 10:04 PM

0 votes

0 answers

42 views

Kernel APIs to disable/enable CPU cores within a driver module

linux-kernel kernel-modules cpu-usage hot-plug

I have a linux kernel version 6.8.0-57-generic (Ubuntu Jammy). I want try disabling and enabling the cores in the CPU through the linux kernel module (from kernel space) What is the right way to do this ? I understand the [cores can be disabled from userspace][1] . However I intend to try doing the...

                                  I have a linux kernel version 6.8.0-57-generic (Ubuntu Jammy). 

I want try disabling and enabling the cores in the CPU through the linux kernel module (from kernel space) What is the right way to do this ?
I understand the cores can be disabled from userspace  . However I intend to try doing the same from a kernel module.

Thanks in Advance

ss22 (13 rep)

Apr 13, 2025, 05:14 PM

1 votes

0 answers

47 views

ARM64 commands take seconds to finish

linux cpu-usage arm64 delay

I'm on an ARM board running Linux. The hardware is a vehicle domain control board which has 6 core ARM Cortex-A78AE and some machine learning cores. I don't want to reboot it because It might be a hardware or driver bug? which might be the reason that causes my performance loss issue. ```none root@h...

root@hobot:~# uname -a
Linux hobot 6.1.94-rt33 #1 SMP PREEMPT_RT Fri Nov  8 15:11:35 CST 2024 aarch64 GNU/Linux

I don't know what happend with my OS today, I just suddenly found out that shell commands takes too long to finish. but everything was fine a little while ago. like:

As you can see, ls takes about over 5 seconds and it takes 100% CPU in core 4th. I tried strace with ls. It get stuck in nothing.

What should I monitor to find out what happened?

Xingx1 (11 rep)

Feb 23, 2025, 06:22 AM • Last activity: Feb 23, 2025, 07:45 AM

0 votes

2 answers

258 views

Process called lsof using too much CPU

cpu-usage lsof

I keep having ```sh lsof -w -l +d /var/lib/php5 ``` eating up my CPU I want to know who is triggering it and what does it have with php5 ...

I keep having

lsof -w -l +d /var/lib/php5

eating up my CPU I want to know who is triggering it and what does it have with php5 ...

Aleksandar Pavić (109 rep)

Dec 17, 2024, 09:50 AM • Last activity: Jan 16, 2025, 03:21 PM

1 votes

1 answers

214 views

What exactly does %wait in pidstat mean?

process cpu cpu-usage sysstat

Environment Ubuntu22.04 sysstat version 12.2.0 Number of logical CPUs 16 man pidstat shows the following, but I would like to know more specifically about the denominator and numerator of %wait. > Percentage of CPU spent by the task while waiting to run. One time I was looking at the performance of...

                                  Environment
Ubuntu22.04
sysstat version 12.2.0
Number of logical CPUs 16

man pidstat shows the following, but I would like to know more specifically about the denominator and numerator of %wait.

> Percentage of CPU spent by the task while waiting to run.

One time I was looking at the performance of each process in pidstat and found that there were times when the %wait of one process was quite high, such as 90%. At the same time, %usr was also high and sometimes exceeded 100%.

At that time, the total CPU utilization was close to 100%, and the total %CPU of all processes was close to 1600%, which is the number of CPU cores. However, the total %wait of all processes was more than 1000%. Since CPU resources are used only for %CPU, I assume that %wait is the time when CPU is not used, but what is this percentage?

In mpstat, for example, the total of %usr, %sys, %iowait, %idle, etc. would be about 100% of CPU resources. What exactly is the %wait in the case of pidstat?

If the process is simply not running, I think %wait would also be 0. It is also curious that during the times when %wait was high, disk i/o and net i/o were not high when examined by dstat and other means.

From reading the systat code, it seems that %wait, like %usr and %sys, uses as its numerator the cumulative increase in the amount of time that corresponds to wait at a given point in time, and as its denominator the period of time compared to that point in time. I am not sure what kind of time can be considered as %wait.

LAPK (11 rep)

Jan 15, 2025, 11:13 PM • Last activity: Jan 16, 2025, 12:15 AM

0 votes

0 answers

29 views

Linux kernel cgroup v2 CFS - cpu throttled_usec accounting?

kernel cpu cgroups cpu-usage cpulimit

In Linux kernel cgroup v2’s CFS scheduler, how is cpu.stat `throttled_usec` accounted when a cgroup with multiple threads gets throttled during a single quota period? Specifically, is `throttled_usec` tracked as the total wall-clock time that the cgroup was throttled as a whole, or is it a sum of th...

                                  In Linux kernel cgroup v2’s CFS scheduler, how is cpu.stat throttled_usec accounted when a cgroup with multiple threads gets throttled during a single quota period? Specifically, is throttled_usec tracked as the total wall-clock time that the cgroup was throttled as a whole, or is it a sum of the throttled times of all individual threads?

Kernel Version: "5.14.0-284.11.1.el9_2.x86_64 #1 SMP PREEMPT_DYNAMIC Tue May 9 11:41:53 PDT 2023 x86_64 x86_64 x86_64 GNU/Linux" Distro: Oracle Linux 9.x

ALZ (961 rep)

Jan 11, 2025, 12:35 PM

13 votes

5 answers

57255 views

setroubleshootd excessive cpu and memory usage

centos selinux cpu-usage

I have Centos 7 fresh install and I see setroubleshootd with high CPU usage. How can I fix this? What is this process doing?

                                  I have Centos 7 fresh install and I see setroubleshootd with high CPU usage. How can I fix this? What is this process doing?
                                

stiv (1691 rep)

Mar 29, 2019, 08:41 PM • Last activity: Dec 9, 2024, 11:31 PM

0 votes

0 answers

33 views

What is the meaning of columns in the table displayed by cpupower-monitor?

linux performance cpu-usage cpu-frequency

Running `sudo cpupower monitor` on Ubuntu 24.04 I'm getting: ``` | Nehalem || Mperf || RAPL || Idle_Stats CPU| C3 | C6 | PC3 | PC6 || C0 | Cx | Freq || pack | core | unco || POLL | C1_A | C2_A | C3_A 0| 0.00| 0.46| 0.00| 0.00|| 11.71| 88.29| 3060||37496059|27532217| 73852|| 0.01| 8.94| 79.93| 0.00 1...

Running sudo cpupower monitor on Ubuntu 24.04 I'm getting:

| Nehalem                   || Mperf              || RAPL               || Idle_Stats
 CPU| C3   | C6   | PC3  | PC6   || C0   | Cx   | Freq  || pack | core | unco  || POLL | C1_A | C2_A | C3_A
   0|  0.00|  0.46|  0.00|  0.00|| 11.71| 88.29|  3060||37496059|27532217| 73852||  0.01|  8.94| 79.93|  0.00
   1|  0.00|  0.46|  0.00|  0.00||  6.56| 93.44|  2790||37496059|27532217| 73852||  0.00|  2.56| 53.19| 37.97
   2|  0.00|  4.67|  0.00|  0.00||  9.14| 90.86|  3194||37496059|27532217| 73852||  0.01|  4.43| 58.09| 28.77
   3|  0.00|  4.67|  0.00|  0.00||  6.24| 93.76|  3239||37496059|27532217| 73852||  0.00|  1.69| 34.62| 57.66
   4|  0.00|  0.00|  0.00|  0.00||  0.29| 99.71|  4461||37496059|27532217| 73852||  0.00|  0.00|  0.00| 99.97
   5|  0.00|  0.00|  0.00|  0.00|| 29.52| 70.48|  4457||37496059|27532217| 73852||  0.00| 64.48|  0.00|  6.48
   6|  0.00|  0.00|  0.00|  0.00|| 47.92| 52.08|  4123||37496059|27532217| 73852||  0.00|  0.95|  0.31| 51.40
   7|  0.00|  0.00|  0.00|  0.00||  0.00|100.00|  3753||37496059|27532217| 73852||  0.00|  0.00|  0.00| 99.99
   8|  0.00| 25.03|  0.00|  0.00||  3.27| 96.73|  2807||37496059|27532217| 73852||  0.01|  4.13| 54.72| 38.03
   9|  0.00| 62.12|  0.00|  0.00||  2.32| 97.68|  2897||37496059|27532217| 73852||  0.00|  1.29| 30.79| 65.69
  10|  0.00| 77.85|  0.00|  0.00||  2.19| 97.81|  3064||37496059|27532217| 73852||  0.00|  0.94| 18.42| 78.51
  11|  0.00|  0.00|  0.00|  0.00|| 14.84| 85.16|  2497||37496059|27532217| 73852||  0.01| 57.88| 27.45|  0.38
  12|  0.00| 70.55|  0.00|  0.00||  3.27| 96.73|  2399||37496059|27532217| 73852||  0.00|  1.33| 30.14| 65.41
  13|  0.00| 54.45|  0.00|  0.00||  4.02| 95.98|  2213||37496059|27532217| 73852||  0.00|  1.06| 43.49| 51.62
  14|  0.00| 67.90|  0.00|  0.00||  3.36| 96.64|  2334||37496059|27532217| 73852||  0.00|  1.34| 30.91| 64.54
  15|  0.00| 72.39|  0.00|  0.00||  2.41| 97.59|  2167||37496059|27532217| 73852||  0.00|  1.25| 26.74| 69.73

What is the meaning of columns in this table? cpupower manual (https://linux.die.net/man/1/cpupower-monitor) doesn't have that information. I'm assuming that C3, C6, etc. are percentages of time spent in a given CPU C-state. Also, I'm running a program with two threads pinned to CPU #5 and #6. Cores #2 and #3 containing hyperthreaded CPUs 4-7 are isolated with GRUB_CMDLINE_LINUX="nohz=on nohz_full=4-7 rcu_nocbs=4-7 isolcpus=4-7 irqaffinity=0-3,8-15". The table above confirms load on CPU #5 and #6, but also consistently shows some marginal load on CPU #4. It doesn't correspond to the screen of Ubuntu System Monitor, which consistently shows 0% load on CPU #4 listed as CPU5 here:

Additionally, why logical CPUs of the same physical core show different frequencies?

Paul Jurczak (151 rep)

Nov 23, 2024, 03:38 AM • Last activity: Nov 23, 2024, 06:46 AM

1 votes

1 answers

158 views

100% CPU on 4 of 8 cores on Oracle Linux

kernel cpu-usage oracle-linux

I have a desktop computer (Intel i4770) running Oracle Linux 7.9 with kernel 4.1.12-61. I usually keep it off and only turn it on on the rare occasions when I need to test something. A month or so ago, I turned it on and noticed that the fans were on max speed - I checked top and found that setroubl...

                                  I have a desktop computer (Intel i4770) running Oracle Linux 7.9 with kernel 4.1.12-61.  I usually keep it off and only turn it on on the rare occasions when I need to test something.  A month or so ago, I turned it on and noticed that the fans were on max speed - I checked top and found that setroubleshoot was at 100% so I killed the process.  The process kept coming back and I kept killing it but ultimately, it didn't matter much because my testing was done and I turned the computer off again.  (Yes, I always shut it down the right way.)  

Now trying to get to the root of the problem, setroubleshoot is no longer showing 100% in top.  In fact, nothing is even close to 100%.  Running htop, I can get details about the CPUs and 4 of the 8 cores are permanently 100%.  From the time the computer lets me log in to when I shut it down.  But there's nothing in the list of processes even above 5.2%.  

When I run perf on each core with perf top -C 1 --sort comm, I can see that cores zero through 3 are all 100% kernel.  

Here is the perf report from running perf record -a -F 999 -- sleep 10.  I don't know if the failure to find useful symbols is indicative of the problem I'm chasing, if it is a different issue that I'll need help figure out, or if it is something that should be ignored.  

On the desktop of this computer, I noticed a bunch of SELinux errors.  They all appear to be saying that there was an attempt to execute something that should not have been allowed.  

And just to confirm that htop was right about what it was reporting, here's the report from the System Monitor.  

Booting into a prior version of the kernel didn't help.  And booting into the "rescue" kernel didn't help either.  

I tried updating the kernel but that didn't help.  I ran a software update and that didn't help either.  Note that I hadn't done any updates or installed any new software immediately prior to this problem starting.  This install had been stable for years when I needed it.  

I also tried installing the same OS over again on a new external drive.  That worked.  No issues on that drive.  But when I boot to that drive and then choose the kernel that is on the main drive, the problem returns.  That all seems to prove that the kernel isn't the issue but the system main drive has something wrong.  

I'm at a loss for how to debug further.  I can't figure out what changed and why so I don't know how to even start fixing it.  Any help about where to look and what to check would be appreciated!  

___ Edit 1: ___

Output from ps -efl|sort -rk14|head: 

    F S UID        PID  PPID  C PRI  NI ADDR SZ WCHAN  STIME TTY       TIME CMD
    0 S gdm       2085  2041  2  80   0 - 905731 -     17:42 ?        00:00:02 /usr/bin/gnome-shell
    4 S root         1     0  1  80   0 - 54811 -      17:41 ?        00:00:01 /usr/lib/systemd/systemd --switched-root --system --deserialize 22
    4 S root       837     1  1  80   0 - 22671 -      17:42 ?        00:00:01 /sbin/rngd -f
    1 S root       405     2  0  60 -20 -     0 -      17:41 ?        00:00:00 [xfs_mru_cache]
    1 S root       407     2  0  60 -20 -     0 -      17:41 ?        00:00:00 [xfs-data/sda1]
    1 S root       408     2  0  60 -20 -     0 -      17:41 ?        00:00:00 [xfs-conv/sda1]
    1 S root       409     2  0  60 -20 -     0 -      17:41 ?        00:00:00 [xfs-cil/sda1]
    1 S root       406     2  0  60 -20 -     0 -      17:41 ?        00:00:00 [xfs-buf/sda1]
    1 S root       404     2  0  60 -20 -     0 -      17:41 ?        00:00:00 [xfsalloc]

output from dmesg | grep libsystem

    [    3.344918] audit: type=1400 audit(1731105719.768:4): avc:  denied  { execute } for  pid=496 comm="systemd-journal" path="/usr/local/lib/libsystem.so" dev="sda1" ino=136977390 scontext=system_u:system_r:syslogd_t:s0 tcontext=unconfined_u:object_r:user_tmp_t:s0 tclass=file permissive=0
    [    3.351928] audit: type=1400 audit(1731105719.775:6): avc:  denied  { execute } for  pid=502 comm="systemd-readahe" path="/usr/local/lib/libsystem.so" dev="sda1" ino=136977390 scontext=system_u:system_r:readahead_t:s0 tcontext=unconfined_u:object_r:user_tmp_t:s0 tclass=file permissive=0
    [    3.351929] audit: type=1400 audit(1731105719.775:5): avc:  denied  { execute } for  pid=503 comm="systemd-readahe" path="/usr/local/lib/libsystem.so" dev="sda1" ino=136977390 scontext=system_u:system_r:readahead_t:s0 tcontext=unconfined_u:object_r:user_tmp_t:s0 tclass=file permissive=0
    [    3.374010] audit: type=1400 audit(1731105719.797:7): avc:  denied  { execute } for  pid=513 comm="systemd-tmpfile" path="/usr/local/lib/libsystem.so" dev="sda1" ino=136977390 scontext=system_u:system_r:systemd_tmpfiles_t:s0 tcontext=unconfined_u:object_r:user_tmp_t:s0 tclass=file permissive=0
    [    3.386383] audit: type=1400 audit(1731105719.810:8): avc:  denied  { execute } for  pid=525 comm="systemd-sysctl" path="/usr/local/lib/libsystem.so" dev="sda1" ino=136977390 scontext=system_u:system_r:systemd_sysctl_t:s0 tcontext=unconfined_u:object_r:user_tmp_t:s0 tclass=file permissive=0
    [    3.397457] audit: type=1400 audit(1731105719.821:9): avc:  denied  { execute } for  pid=536 comm="hostname" path="/usr/local/lib/libsystem.so" dev="sda1" ino=136977390 scontext=system_u:system_r:hostname_t:s0 tcontext=unconfined_u:object_r:user_tmp_t:s0 tclass=file permissive=0
    [    3.673352] audit: type=1400 audit(1731105720.097:10): avc:  denied  { execute } for  pid=657 comm="alsactl" path="/usr/local/lib/libsystem.so" dev="sda1" ino=136977390 scontext=system_u:system_r:alsa_t:s0-s0:c0.c1023 tcontext=unconfined_u:object_r:user_tmp_t:s0 tclass=file permissive=0

___ Edit 2: ___

I installed kernel debug info and downgraded perf to version 3 since apparently there is a bug with the perf version for OL7.9 .  However there are still symbols that can't be found.  

The number in the list above, 1399, is the PID for the process but that process isn't visible either through htop or ps.  As soon as I did a kill -9 1399, the CPU usage immediately dropped to zero.  That's nice because at least the problem process is now dead.  And I know how to kill it, even though I don't see it in the normal process lists.  

But the fundamental question remains - where is this process coming from and how do I stop it from starting in the first place!?

ktbos (111 rep)

Nov 8, 2024, 07:24 PM • Last activity: Nov 11, 2024, 05:30 PM

0 votes

1 answers

118 views

Why is the Linux CPU stalling when having multithreaded memory writes?

ubuntu memory cpu cpu-usage

HW Specs: - CPU: 64 Cores, 128 Threads, AMD Ryzen Threadripper Pro 5995WX - RAM: 512 GB, Manufacturer unknown, will try to provide if needed Linux Specs: - OS: Ubuntu 22.04.4 LTS - Linux Kernel: 5.15.0-119-generic I'm trying to get model training to work with pytorch on a Linux server, where I have observed performance degradation of a factor ~10 after letting a resource intensive training task run for a couple of minutes (Training on 4 GPUs having a multithreaded Dataloader each). Trying to isolate the root cause for this issue, I have now come up with a minimal test in python reproducing the issue, by continuously writing 1GB of data to RAM. Running this with 32 Threads in parallel (CPU has 128 Threads available) the CPU stalls after 0%) until giving it some cooldown time of approx 1min. I have run the test on another server (48 CPU threads, 160GB RAM) for 10 minutes without any problems (On this server multi-GPU training is also running without any performance degradation). Opposed to the self-implemented memory write test, I have also tried a benchmark test using sysbench, writing 10TB of data with up to 96 Threads without any problem. This is where I don't really understand the difference, whether this task writes the data only in some sort of buffer without really allocating any RAM memory? I ran the test with the follwing command:

sysbench --threads=96 --time=0 --memory-block-size=128K --memory-total-size=10T --report-interval=1 --memory-oper=write memory run

The main observable difference of sysbench to my python test script was in htop, where sysbench had all threads running as normal priority/user threads (green bars) while my python script caused a large portion being kernel time (red bars), in my understanding caused by a lot of wait time required. My question now is, does this diagnostic give some indication on why the system is stalling? Might there be a hardware issue with RAM or could this be an issue with the OS? Or what further tests could I do to isolate the root cause? --- Edit: In the following you can find the minimal python script:

import time
import numpy as np
import threading

data = np.zeros((1024, 1024, 1024, 1), dtype=np.uint8)

def allocate_memory():
    while True:
        start_time = time.time()
        _ = data * 0
        end_time = time.time()
        print(f"Time: {end_time - start_time:.3f} s")
    
    print(data.shape)

def run_in_threads(num_threads):
    threads = []
    for _ in range(num_threads):
        thread = threading.Thread(target=allocate_memory)
        thread.start()
        threads.append(thread)
    
    for thread in threads:
        thread.join()

if __name__ == "__main__":
    num_threads = 32
    run_in_threads(num_threads)

m4fr1699 (1 rep)

Nov 5, 2024, 08:27 AM • Last activity: Nov 7, 2024, 08:58 AM

1 votes

1 answers

329 views

Btop - What is the LAV value mean?

linux cpu cpu-usage btop

I have a KVM guest and inside it I am running vnc server. I have a ssh tunnel that I connect to it using tigervnc viewer. Everything works good for it except one issue. Using Chrome (using X server), when I vertically scroll on a website it is a bit laggy. I checked the cpu and it looks good, I neve...

                                  I have a KVM guest and inside it I am running vnc server. I have a ssh tunnel that I connect to it using tigervnc viewer. Everything works good for it except one issue. Using Chrome (using X server), when I vertically scroll on a website it is a bit laggy. I checked the cpu and it looks good, I never see it at 100%... at most 50%. Memory is good to... I usually have about 30 gigs free of ram.

However, I see the first LAV value is 1.88 on btop. What does that mean exactly? Does that mean 100 percent of the cpu is being used and 88 percent of processes are waiting?

dman (569 rep)

Oct 14, 2024, 05:04 PM • Last activity: Oct 14, 2024, 05:18 PM

-1 votes

1 answers

37 views

(Solaris) Ram CPU monitoring script is grabbing incorrect cpu ram utilization values

shell-script memory solaris cpu cpu-usage

```#!/bin/bash host=$(hostname) email="abc.@xyz.com" # Change to your desired email subject="Attention!!! Health check Failed on $host" echo $(date) # CPU use threshold cpu_threshold=0 # Memory idle threshold mem_threshold=0 #--- CPU cpu_usage () { # Get CPU idle percentage cpu_idle=$(prstat -Z 1 1...

#!/bin/bash

host=$(hostname)
email="abc.@xyz.com"  # Change to your desired email
subject="Attention!!! Health check Failed on $host"
echo $(date)

# CPU use threshold
cpu_threshold=0

# Memory idle threshold
mem_threshold=0

#--- CPU
cpu_usage () {
    # Get CPU idle percentage
    cpu_idle=$(prstat -Z 1 1 | awk 'NR==2 {print $8}' | tr -d '%')
    
    # Check if cpu_idle is a valid number
    if ! [[ "$cpu_idle" =~ ^[0-9]+$ ]]; then
        echo "Error: Invalid CPU idle value: $cpu_idle"
        cpu_use=0
    else
        cpu_use=$((100 - cpu_idle))
    fi

    cpu_flag=0
    echo "CPU utilization: $cpu_use%"
    if [ "$cpu_use" -gt "$cpu_threshold" ]; then
        echo "CPU warning!!!"
        cpu_flag=1
    else
        echo "CPU ok!!!"
    fi
}

#--- Memory
mem_usage () {
    mem_total=$(kstat -m | grep "physmem" | awk '{print $2}')  # Total memory in bytes
    mem_free=$(kstat -m | grep "freemem" | awk '{print $2}')   # Free memory in bytes

    if [[ -z "$mem_total" || "$mem_total" -eq 0 ]]; then
        echo "Failed to retrieve total memory."
        mem_total=0
        mem_free=0
    fi

    # Convert bytes to GB
    mem_total_gb=$((mem_total / 1024 / 1024))
    mem_free_gb=$((mem_free / 1024 / 1024))

    echo "Total memory: $mem_total_gb GB"
    echo "Free memory: $mem_free_gb GB"

    if [ "$mem_total" -gt 0 ]; then
        per_mem=$(( ((mem_total - mem_free)) * 100 / mem_total ))
    else
        per_mem=0
    fi

    echo "Memory space remaining: $mem_free_gb GB"
    mem_flag=0
    if [ "$per_mem" -gt "$mem_threshold" ]; then
        echo "Memory warning!!!"
        mem_flag=1
    else
        echo "Memory ok!!!"
    fi
}

out() {
    if [ "$cpu_flag" -eq 1 ] && [ "$mem_flag" -eq 1 ]; then 
        printf "
Hello Team,

Please check RAM and CPU utilization on $host.
Current RAM Percentage: $per_mem%%
Current CPU Percentage: $cpu_use%%

Thanks " > /tmp/health.txt
    elif [ "$cpu_flag" -eq 1 ]; then 
        printf "
Hello Team,

Please check CPU utilization on $host.
Current CPU Percentage: $cpu_use%%

Thanks " > /tmp/health.txt
    elif [ "$mem_flag" -eq 1 ]; then 
        printf "
Hello Team,

Please check RAM utilization on $host.
Current RAM Percentage: $per_mem%%

Thanks " > /tmp/health.txt
    fi
}

mail() {
    if [ "$cpu_flag" -eq 1 ] || [ "$mem_flag" -eq 1 ]; then
        /usr/sbin/sendmail -t <




It gives the following output. Despite using kstat, it throws this error. What am I doing wrong here?

Fri Oct 11 06:51:21 CDT 2024
Error: Invalid CPU idle value: 17:25:22
CPU utilization: 0%
CPU ok!!!
Usage:
kstat [ -qlp ] [ -T d|u ] [ -c class ]
      [ -m module ] [ -i instance ] [ -n name ] [ -s statistic ]
      [ interval [ count ] ]
kstat [ -qlp ] [ -T d|u ] [ -c class ]
      [ module:instance:name:statistic ... ]
      [ interval [ count ] ]
Usage:
kstat [ -qlp ] [ -T d|u ] [ -c class ]
      [ -m module ] [ -i instance ] [ -n name ] [ -s statistic ]
      [ interval [ count ] ]
kstat [ -qlp ] [ -T d|u ] [ -c class ]
      [ module:instance:name:statistic ... ]
      [ interval [ count ] ]
Failed to retrieve total memory.
Total memory: 0 GB
Free memory: 0 GB
Memory space remaining: 0 GB
Memory ok!!!

The CPU utilization is getting grabbed correctly now, only the memory usage is not reported correctly. I am not sure which module to use with kstat command, so I used prtconf as well but that didn't capture it either.


                          
                          
                        
                        
                        
                          
                            
                            Navdeep Singh
                            (37 rep)
                          
                          
                            
                            Oct 11, 2024, 11:59 AM
                            
                              • Last activity: Oct 11, 2024, 01:02 PM


          
          

          
          
            
              
                
                  
                    Previous
                  
                
                
                
                  Page 1
                
                
                
                  
                    
                      Next
                    
                  
                
              
            
            
            
              
              Showing page 1 of 20 total questions