Unix & Linux Stack Exchange

Q&A for users of Linux, FreeBSD and other Unix-like operating systems

Latest Questions

4 votes

3 answers

3078 views

Check what process is spiking the average load with atop

In trying to find the culprit of a high load average on a system during night (which does not seem to be related to logrotate) I installed atop to write a raw file with a specific interval. While reading the file, it seems the processlist stands still, can I somehow go back and forth between the sam...

                                  In trying to find the culprit of a high load average on a system during night (which does not seem to be related to logrotate) I installed atop to write a raw file with a specific interval. While reading the file, it seems the processlist stands still, can I somehow go back and forth between the samples to see what sticks out, and further sort by any column (like cpu usage)?

user135361 (193 rep)

Sep 3, 2013, 01:31 PM • Last activity: Jun 4, 2025, 01:06 AM

3 votes

1 answers

1997 views

Unusual high load average (due to peak I/O wait? irqs?)

arch-linux interrupt load-average irq

I have a problem with high load average (`~2`) on my (personal laptop) computer for a long time now. I am running Arch Linux. If I remember correctly, the problem started with a certain kernel update, initially I thought it was related to [this bug][1]. The problem was not solved though, when the bu...

                                  I have a problem with high load average (~2) on my (personal laptop) computer for a long time now. I am running Arch Linux. If I remember correctly, the problem started with a certain kernel update, initially I thought it was related to this bug . The problem was not solved though, when the bug was fixed. I did not really care as I thought it is still a bug, because the performance did not seem to suffer. What made me curious is that, recently, I had a moment of super low load average (~0)  while idling. After a reboot, everything went back to "normal", with high load average. So I started investigating:

     % uptime
     14:31:04 up  2:22,  1 user,  load average: 1.96, 1.98, 1.99

So far nothing new. Then I tried top:

     % top -b -n 1
    top - 14:33:52 up  2:25,  1 user,  load average: 2.02, 2.07, 2.02
    Tasks: 146 total,   2 running, 144 sleeping,   0 stopped,   0 zombie
    %Cpu0  :   2.6/0.9     3[||||												     ]
    %Cpu1  :   2.7/0.9     4[||||												     ]
    %Cpu2  :   2.7/1.0     4[||||												     ]
    %Cpu3  :   2.7/0.8     3[||||												     ]
    GiB Mem :228125107552256.0/7.712    [
    GiB Swap:  0.0/7.904	[												     ]
    
      PID USER	PR  NI	  VIRT	  RES  %CPU %MEM     TIME+ S COMMAND
        2 root	20   0	  0.0m	 0.0m	0.0  0.0   0:00.00 S kthreadd
      404 root	20   0	  0.0m	 0.0m	0.0  0.0   0:01.09 D  `- rtsx_usb_ms_2
     1854 root	20   0	  0.0m	 0.0m	0.0  0.0   0:06.03 D  `- kworker/0:2

I cut out all the processes and kernel threads except those two. Here we can see already some suspicious kernel threads (state D). And some suspicious Mem value (see edit)...

Looking at CPU:

     % mpstat
    Linux 4.13.12-1-ARCH (arch)	30.11.2017	_x86_64_	(4 CPU)
    
    14:36:09     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
    14:36:09     all    2.66    0.00    0.88    1.56    0.00    0.01    0.00    0.00    0.00   94.90
     % sar -u 1 30
    Linux 4.13.12-1-ARCH (arch)	30.11.2017	_x86_64_	(4 CPU)
    
    14:37:04	CPU	%user	  %nice	  %system   %iowait    %steal	  %idle
    14:37:05	all	 1.00	   0.00	     0.75      0.00	 0.00	  98.25
    14:37:06	all	 1.76	   0.00	     0.50      0.00	 0.00	  97.74
    14:37:07	all	 1.00	   0.00	     0.25      0.00	 0.00	  98.75
    14:37:08	all	 0.50	   0.00	     0.50      0.00	 0.00	  99.00
    14:37:09	all	 0.50	   0.00	     0.50      0.25	 0.00	  98.75
    14:37:10	all	 0.50	   0.00	     0.50      6.03	 0.00	  92.96
    14:37:11	all	 0.75	   0.00	     0.50     11.75	 0.00	  87.00
    14:37:12	all	 0.50	   0.00	     0.25      0.00	 0.00	  99.25
    [ . . . ]
    14:37:21	all	 1.26	   0.00	     0.76      0.00	 0.00	  97.98
    14:37:22	all	 0.75	   0.00	     0.25      2.26	 0.00	  96.73
    14:37:23	all	 0.50	   0.00	     0.50     16.83	 0.00	  82.16
    14:37:24	all	 0.75	   0.00	     0.50      0.00	 0.00	  98.74
    14:37:25	all	 0.50	   0.00	     0.50      0.00	 0.00	  98.99
    14:37:26	all	 0.76	   0.00	     0.50      7.56	 0.00	  91.18
    14:37:27	all	 0.25	   0.00	     0.51      0.00	 0.00	  99.24
    14:37:28	all	 1.00	   0.00	     0.75      0.25	 0.00	  98.00
    14:37:29	all	 0.25	   0.00	     0.76      0.00	 0.00	  98.99
    14:37:30	all	 0.75	   0.00	     0.50      0.00	 0.00	  98.74
    14:37:31	all	 0.75	   0.00	     0.50      3.27	 0.00	  95.48
    14:37:32	all	 0.51	   0.00	     0.51     13.16	 0.00	  85.82
    14:37:33	all	 0.75	   0.00	     0.50      0.25	 0.00	  98.49
    14:37:34	all	 1.26	   0.00	     0.75      0.00	 0.00	  97.99
    Average:	all	 0.71	   0.00	     0.56      2.06	 0.00	  96.67

reveals some peaks in I/O wait. The best guess so far. Looking closer:

     % iostat -x 1 30
    Linux 4.13.12-1-ARCH (arch)	30.11.2017	_x86_64_	(4 CPU)
    
    avg-cpu:  %user	  %nice %system %iowait	 %steal	  %idle
    	   2.60	   0.00	   0.87	   1.55	   0.00	  94.98
    
    Device:		rrqm/s	 wrqm/s	    r/s	    w/s	   rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
    sda		  0.93	   3.00	   3.71	   1.94	   95.04   102.27    69.91     0.60  106.78   16.56  279.32  14.47   8.17
    
    avg-cpu:  %user	  %nice %system %iowait	 %steal	  %idle
    	   0.75	   0.00	   0.75	   0.25	   0.00	  98.25
    
    Device:		rrqm/s	 wrqm/s	    r/s	    w/s	   rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
    sda		  0.00	   0.00	   0.00	   1.00	    0.00     0.00     0.00     0.01   13.00    0.00   13.00  10.00   1.00
    
    [ . . . ]
    
    avg-cpu:  %user	  %nice %system %iowait	 %steal	  %idle
    	   0.50	   0.00	   0.50	  17.04	   0.00	  81.95
    
    Device:		rrqm/s	 wrqm/s	    r/s	    w/s	   rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
    sda		  0.00	   8.00	   0.00	   2.00	    0.00    40.00    40.00     0.69  346.50    0.00  346.50 346.50  69.30
    
    [ . . . ]
    
    avg-cpu:  %user	  %nice %system %iowait	 %steal	  %idle
    	   0.25	   0.00	   0.50	   7.29	   0.00	  91.96
    
    [ . . . ]
    
    avg-cpu:  %user	  %nice %system %iowait	 %steal	  %idle
    	   1.00	   0.00	   0.75	  16.96	   0.00	  81.30
    
    Device:		rrqm/s	 wrqm/s	    r/s	    w/s	   rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
    sda		  0.00	   5.00	   0.00	   2.00	    0.00    28.00    28.00     0.71  357.00    0.00  357.00 356.50  71.30
    
    [ . . . ]
    
    avg-cpu:  %user	  %nice %system %iowait	 %steal	  %idle
    	   0.50	   0.00	   0.50	   0.00	   0.00	  99.00
    
    Device:		rrqm/s	 wrqm/s	    r/s	    w/s	   rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
    sda		  0.00	   0.00	   0.00	   0.00	    0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00

Looking at processes with uninterruptable sleep:

     % for x in seq 1 1 10; do ps -eo state,pid,cmd | grep "^D"; echo "----"; sleep 5; done
    D   404 [rtsx_usb_ms_2]
    D  1854 [kworker/0:2]
    D  2877 [kworker/0:0]
    ----
    D   404 [rtsx_usb_ms_2]
    D  1854 [kworker/0:2]
    D  2877 [kworker/0:0]
    ----
    D   404 [rtsx_usb_ms_2]
    D  1854 [kworker/0:2]
    D  2877 [kworker/0:0]
    ----
    D   404 [rtsx_usb_ms_2]
    D  2877 [kworker/0:0]
    ----
    D   404 [rtsx_usb_ms_2]
    ----
    D   404 [rtsx_usb_ms_2]
    D  1854 [kworker/0:2]
    D  2877 [kworker/0:0]
    ----
    D   404 [rtsx_usb_ms_2]
    D  2877 [kworker/0:0]
    ----
    D   404 [rtsx_usb_ms_2]
    D  2877 [kworker/0:0]
    ----
    D   404 [rtsx_usb_ms_2]
    D  1854 [kworker/0:2]
    D  2877 [kworker/0:0]
    ----
    D   404 [rtsx_usb_ms_2]
    D  3177 [kworker/u32:4]
    ----

and last thing I did:

     % vmstat 1
    procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
     r  b	swpd   free   buff  cache   si	 so    bi    bo	  in   cs us sy id wa st
     0  1	   0 5010040 123612 1220080    0    0	 23    25  111	433  3	1 95  2	 0
     0  0	   0 5006256 123612 1224164    0    0	  0    96  186	839  1	1 97  1	 0
     1  0	   0 5006132 123612 1224164    0    0	  0	0  175	714  1	0 99  0	 0
     0  0	   0 5003156 123612 1224156    0    0	  0	0  234 1009  2	1 98  0	 0
     0  0	   0 5003156 123612 1224156    0    0	  0	0  161	680  0	0 99  0	 0
     0  1	   0 5003156 123616 1224156    0    0	  0    60  214	786  1	1 94  5	 0
     0  0	   0 5003280 123620 1224156    0    0	  0	4  226	776  1	0 88 11	 0
     1  0	   0 5003156 123620 1224156    0    0	  0	0  210	733  1	0 99  0	 0
     0  0	   0 5005388 123620 1224156    0    0	  0	0  159	747  1	0 99  0	 0
     0  0	   0 5005388 123620 1224156    0    0	  0	0  233	803  1	0 99  0	 0
     0  0	   0 5005512 123620 1224156    0    0	  0	0  152	670  1	0 99  0	 0
     0  0	   0 5009664 123620 1220060    0    0	  0	0  240	914  1	1 99  0	 0
     0  0	   0 5009540 123620 1220060    0    0	  0	0  237	833  1	1 99  0	 0
     0  0	   0 5009664 123620 1220060    0    0	  0	0  166	999  1	1 99  0	 0
     0  1	   0 5009664 123620 1220060    0    0	  0	4  168	700  1	0 88 11	 0
     0  0	   0 5009540 123628 1220060    0    0	  0    12  207	778  1	1 91  8	 0
     0  0	   0 5009788 123628 1220064    0    0	  0	0  189	717  0	1 99  0	 0
     0  0	   0 5009664 123628 1220064    0    0	  0	0  243 1453  1	1 98  0	 0
     0  0	   0 5009044 123628 1220576    0    0	  0	0  166	708  1	0 99  0	 0
     0  0	   0 5009168 123628 1220576    0    0	  0	0  146	663  1	0 99  0	 0
     0  0	   0 5009540 123628 1220064    0    0	  0	0  175	705  1	1 99  0	 0
     0  1	   0 5009292 123632 1220128    0    0	  0	8  223	908  1	0 99  0	 0
    ^C

Now I still don't know what the problem is, but it looks like it comes from some peak I/O operations. There are some suspicious kernel threads. Any further ideas? What else could I do to investigate?

**edit:** The Mem value seems strange, but it just occured very recently, a week ago or so, everything seemed to be normal. And

     % free          
                  total        used        free      shared  buff/cache   available
    Mem:        8086240     1913860     4824764      133880     1347616     6231856
    Swap:       8288252           0     8288252

seems to be fine though.

**edit2:** First results of testing sar monitoring my system (very frequently, intervals of 1 second, but for a short duration, to get the peaks):

    Linux 4.13.12-1-ARCH (arch) 	01.12.2017 	_x86_64_	(4 CPU)
    
    12:36:25        CPU     %user     %nice   %system   %iowait    %steal     %idle
    12:36:26        all      0.50      0.00      0.50      0.00      0.00     99.00
    12:36:27        all      0.50      0.00      0.50      0.25      0.00     98.74
    12:36:28        all      0.50      0.00      0.75      0.00      0.00     98.75
    12:36:29        all      0.50      0.00      0.25      7.52      0.00     91.73
    12:36:30        all      0.25      0.00      0.75      9.77      0.00     89.22
    12:36:31        all      0.25      0.00      0.75      0.00      0.00     98.99
    12:36:32        all      1.00      0.00      0.50      0.25      0.00     98.25
    12:36:33        all      1.00      0.00      1.00      0.00      0.00     98.00
    12:36:34        all      0.25      0.00      0.25      0.25      0.00     99.24
    12:36:35        all      0.50      0.25      0.75     33.25      0.00     65.25
    12:36:36        all      0.50      0.00      0.75      0.25      0.00     98.50
    12:36:37        all      0.75      0.00      0.25      0.00      0.00     99.00
    12:36:38        all      0.25      0.00      0.50      0.00      0.00     99.24
    12:36:39        all      0.50      0.00      0.50      0.00      0.00     99.00
    12:36:40        all      0.50      0.25      0.50     10.75      0.00     88.00
    Average:        all      0.52      0.03      0.57      4.16      0.00     94.72

Network (-n) seems to be alright. Looking at devices (-d) reveals:

    Linux 4.13.12-1-ARCH (arch) 	01.12.2017 	_x86_64_	(4 CPU)
    
    12:36:25          DEV       tps  rd_sec/s  wr_sec/s  avgrq-sz  avgqu-sz     await     svctm     %util
    12:36:26       dev8-0      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
    12:36:26       dev8-1      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
    [ . . . ]
    12:36:29       dev8-7      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
    12:36:30       dev8-0      2.00      0.00     88.00     44.00      0.41    355.00    207.00     41.40
    12:36:30       dev8-1      2.00      0.00     88.00     44.00      0.41    355.00    207.00     41.40
    12:36:30       dev8-2      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
    12:36:30       dev8-3      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
    12:36:30       dev8-4      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
    12:36:30       dev8-5      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
    12:36:30       dev8-6      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
    12:36:30       dev8-7      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
    12:36:31       dev8-0      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
    12:36:31       dev8-1      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
    [ . . . ]
    12:36:34       dev8-7      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
    12:36:35       dev8-0      2.00      0.00     24.00     12.00      0.70    348.50    348.00     69.60
    12:36:35       dev8-1      2.00      0.00     24.00     12.00      0.70    348.50    348.00     69.60
    12:36:35       dev8-2      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
    12:36:35       dev8-3      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
    12:36:35       dev8-4      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
    12:36:35       dev8-5      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
    12:36:35       dev8-6      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
    12:36:35       dev8-7      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
    12:36:36       dev8-0      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
    12:36:36       dev8-1      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
    [ . . . ]
    12:36:40       dev8-7      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
    Average:       dev8-0      0.27      0.00      7.47     28.00      0.12    351.75    455.75     12.15
    Average:       dev8-1      0.27      0.00      7.47     28.00      0.12    351.75    455.75     12.15
    Average:       dev8-2      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
    Average:       dev8-3      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
    Average:       dev8-4      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
    Average:       dev8-5      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
    Average:       dev8-6      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
    Average:       dev8-7      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00

and -b gives:

    Linux 4.13.12-1-ARCH (arch) 	01.12.2017 	_x86_64_	(4 CPU)
    
    12:36:25          tps      rtps      wtps   bread/s   bwrtn/s
    12:36:26         0.00      0.00      0.00      0.00      0.00
    12:36:27         0.00      0.00      0.00      0.00      0.00
    12:36:28         0.00      0.00      0.00      0.00      0.00
    12:36:29         0.00      0.00      0.00      0.00      0.00
    12:36:30         2.00      0.00      2.00      0.00     88.00
    12:36:31         0.00      0.00      0.00      0.00      0.00
    12:36:32         0.00      0.00      0.00      0.00      0.00
    12:36:33         0.00      0.00      0.00      0.00      0.00
    12:36:34         0.00      0.00      0.00      0.00      0.00
    12:36:35         2.00      0.00      2.00      0.00     24.00
    12:36:36         0.00      0.00      0.00      0.00      0.00
    12:36:37         0.00      0.00      0.00      0.00      0.00
    12:36:38         0.00      0.00      0.00      0.00      0.00
    12:36:39         0.00      0.00      0.00      0.00      0.00
    12:36:40         0.00      0.00      0.00      0.00      0.00
    Average:         0.27      0.00      0.27      0.00      7.47

So I assume the issue seems to be related to my hard drive (?). Because the I/O is on partition 1 (my root partition), it should be somewhere outside of /var which has an extra partition. The other partitions are data partitions and not system related.

**edit3:** Even more data to that specific peak: paging looks fine (from my perspective with limited knowledge)

    Linux 4.13.12-1-ARCH (arch) 	01.12.2017 	_x86_64_	(4 CPU)
    
    12:36:25     pgpgin/s pgpgout/s   fault/s  majflt/s  pgfree/s pgscank/s pgscand/s pgsteal/s    %vmeff
    12:36:26         0.00      0.00      0.00      0.00   2233.00      0.00      0.00      0.00      0.00
    12:36:27         0.00      0.00      0.00      0.00     88.00      0.00      0.00      0.00      0.00
    12:36:28         0.00      0.00    766.00      0.00    185.00      0.00      0.00      0.00      0.00
    12:36:29         0.00     40.00      0.00      0.00     47.00      0.00      0.00      0.00      0.00
    12:36:30         0.00      4.00      0.00      0.00     45.00      0.00      0.00      0.00      0.00
    12:36:31         0.00      0.00      1.00      0.00     46.00      0.00      0.00      0.00      0.00
    12:36:32         0.00      0.00      5.00      0.00    560.00      0.00      0.00      0.00      0.00
    12:36:33         0.00      0.00      2.00      0.00     85.00      0.00      0.00      0.00      0.00
    12:36:34         0.00      0.00      2.00      0.00     47.00      0.00      0.00      0.00      0.00
    12:36:35         0.00     12.00      0.00      0.00     44.00      0.00      0.00      0.00      0.00
    12:36:36         0.00      0.00      0.00      0.00     47.00      0.00      0.00      0.00      0.00
    12:36:37         0.00      0.00      2.00      0.00     45.00      0.00      0.00      0.00      0.00
    12:36:38         0.00      0.00      0.00      0.00     47.00      0.00      0.00      0.00      0.00
    12:36:39         0.00      0.00      0.00      0.00     77.00      0.00      0.00      0.00      0.00
    12:36:40         0.00      8.00      0.00      0.00     47.00      0.00      0.00      0.00      0.00
    Average:         0.00      4.27     51.87      0.00    242.87      0.00      0.00      0.00      0.00

It looks like files were created during that peak (-v):

    Linux 4.13.12-1-ARCH (arch) 	01.12.2017 	_x86_64_	(4 CPU)
    
    12:36:25    dentunusd   file-nr  inode-nr    pty-nr
    12:36:26       186520      4480    195468         2
    [ . . . ]
    12:36:34       186520      4480    195468         2
    12:36:35       186520      4512    195468         2
    [ . . . ]
    12:36:40       186520      4512    195468         2
    Average:       186520      4493    195468         2

**edit4:** It looks like some irq's are responsible. Running iotop -o -a (show only processes with i/o and accumulate them, so keep all processes that had i/o since the start of the program) resulted in:

    Total DISK READ :       0.00 B/s | Total DISK WRITE :       0.00 B/s
    Actual DISK READ:       0.00 B/s | Actual DISK WRITE:       0.00 B/s
      TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND     
        7 be/4 root          0.00 B      0.00 B  0.00 % 99.99 % [ksoftirqd/0]
       17 be/4 root          0.00 B      0.00 B  0.00 % 99.99 % [ksoftirqd/1]
       23 be/4 root          0.00 B      0.00 B  0.00 % 99.99 % [ksoftirqd/2]
       29 be/4 root          0.00 B      0.00 B  0.00 % 99.99 % [ksoftirqd/3]
      292 rt/4 root          0.00 B      0.00 B  0.00 % 99.99 % [i915/signal:0]
    [ . . . ]

So, is this a thing? How could I continue...?
                                

nox (161 rep)

Nov 30, 2017, 02:48 PM • Last activity: May 15, 2025, 08:06 PM

5 votes

2 answers

18030 views

Difference between the %CPU usage and load average, and when should it be a concern?

linux cpu-usage load-average

I've searched multiple answers here but couldn't find an answer that is related to this scenario, but if you think you do, kindly point me to it. I am including the numbers here for the ease of my own comprehention. I have a 96 core baremetal linux server with 256 GB RAM that's dedicated to run an in-house written distributed event-based asynchronous network service that acts as a caching server. This daemon runs with 32 worker threads. Apart from the main task of fetching and caching, this server also does a variety of related tasks in a couple of extra separate threads like polling other members' health-checks, writing metrics to a unix socket, etc. The worker threads value isn't bumped further because increasing this will increase the cache lock contention. There is not much disk activity from this server, as the metrics are tried to be written in batches and if the unix socket fails, it just ignores it and frees the memory. This instance is a part of a 9 node cluster and the stats of this node speaks for the rest of the instances in this cluster. With the recent surge in inbound traffic, I see the %CPU usage of the process has gone up considerably but the load average is still less than 1. Please find the stats below.

:~$ nice top
top - 19:51:55 up 95 days,  7:27,  1 user,  load average: 0.33, 0.28, 0.32

PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
587486 cacher   20   0  107.4g  93.0g  76912 S  17.2  37.0   5038:13 cacher

The %CPU goes up till 80% some times but the load average is considerably less even then and doesn't go beyond 1.5, and this happens mostly when there's a cache miss and the cacher has to fetch it from an upstream, so it is mostly a set of network activities. As far as I understood, the compute heavy operation this service does at runtime is when it has to compute the hash of the item to be cached when it has to store it into the appropriate distributed buckets. There are no systemd limits set on this service for any parameters, and it is also tuned to disable the kernel's oomkiller for this process, although it is nowhere near the upper limit. The systemd sockets to which this binds has been tuned to accommodate more tx and rx buffer. * why is the load average less than 1 on a 96 core server when the %CPU for the service that uses 32 threads fluctuates between 20% and 80% consistently? * On a 96 core server, how much is considered a safe value for %CPU to go safely? Does it have a relation with how many threads it is used? If the number of threads are bumped, is a higher %CPU usage theoretically accepted? Thank you.

init (151 rep)

Jan 7, 2023, 08:23 PM • Last activity: Sep 19, 2024, 08:54 AM

-1 votes

1 answers

128 views

How to calculate load average is high or normal?

linux cpu cpu-usage load-average

How to define/find some x.x value is high load by lokkin into uptime in linux server if i've 4 cpus? Is there any formula to calculate? ``` CPU(s): 4 On-line CPU(s) list: 0-3 Vendor ID: GenuineIntel Model name: Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz CPU family: 6 Model: 85 Thread(s) per core...

How to define/find some x.x value is high load by lokkin into uptime in linux server if i've 4 cpus? Is there any formula to calculate?

CPU(s):                   4
  On-line CPU(s) list:    0-3
Vendor ID:                GenuineIntel
  Model name:             Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
    CPU family:           6
    Model:                85
    Thread(s) per core:   2
    Core(s) per socket:   2
    Socket(s):            1
    Stepping:             7

$ uptime
 14:01:38 up 15 min,  2 users,  load average: 7.09, 3.44, 1.96

Current load average is high. But still server is working fine.

$ uptime
 14:39:50 up 53 min,  5 users,  load average: 31.95, 29.13, 24.25

Output of vmstat

$ vmstat 1 3
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
16  0      0 15798924   2168 189096    0    0   329    25  103  117 62  1 37  0  0
16  0      0 15798924   2168 189096    0    0     0     0  405  433 100  0  0  0  0
16  0      0 15798924   2168 189096    0    0     0     0  408  441 100  0  0  0  0

I've see few link. However i couldn't understand.

Priyanka (1 rep)

Aug 15, 2024, 02:43 PM • Last activity: Aug 15, 2024, 05:09 PM

-3 votes

4 answers

12521 views

What is the Unix command-line command to get the system load information?

linux command-line load-average

In Linux or Unix operating system, I am getting the text `System load` as below. Can anyone please tell me what is the meaning for that and how to extract system load % using CLI commands? ``` System load: 6.84 ```

In Linux or Unix operating system, I am getting the text System load as below. Can anyone please tell me what is the meaning for that and how to extract system load % using CLI commands?

System load: 6.84

Dhans (29 rep)

Nov 21, 2019, 03:17 AM • Last activity: Mar 14, 2024, 03:50 AM

0 votes

1 answers

718 views

89% cpu is ideal but the load average is extremely high in rhel8.4

load-average

I am using RHEL 8.4 and I seem to always have a very high load average, despite my CPU being 89% idle: ``` $ uname -a Linux dx11866-hs 4.18.0-305.el8.ppc64le #1 SMP Thu Apr 29 08:53:15 EDT 2021 ppc64le ppc64le ppc64le GNU/Linux $top top - 19:32:45 up 150 days, 3:45, 1 user, load average: 3936.78, 39...

                                  I am using RHEL 8.4 and I seem to always have a very high load average, despite my CPU being 89% idle:

```
$ uname -a
Linux dx11866-hs 4.18.0-305.el8.ppc64le #1 SMP Thu Apr 29 08:53:15 EDT 2021 ppc64le ppc64le ppc64le GNU/Linux

$top
top - 19:32:45 up 150 days,  3:45,  1 user,  load average: 3936.78, 3934.85, 3935.12
Tasks: 819 total,   1 running, 818 sleeping,   0 stopped,   0 zombie
%Cpu(s): 10.6 us,  0.4 sy,  0.0 ni, 89.1 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem : 377629.6 total, 197139.6 free, 169755.4 used,  10734.7 buff/cache
MiB Swap:  16383.9 total,  12444.2 free,   3939.8 used. 199111.0 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
1271217 yarn      20   0 8059136   5.7g  20608 S 318.8   1.6   6716:49 java
 999164 yarn      20   0   10.3g   3.4g 117376 S 162.5   0.9   2:43.75 java
 997941 yarn      20   0   12.0g   2.1g  71040 S  43.8   0.6   3:28.04 java
     10 root      20   0       0      0      0 I   6.2   0.0  90:45.27 rcu_sched
1000002 yarn      20   0   12.0g 761088  65344 S   6.2   0.2   0:12.84 java
1001197 yarn      20   0   12.0g 752704  65344 S   6.2   0.2   0:11.60 java
1001966 root      20   0   17600   8384   4992 R   6.2   0.0   0:00.02 top
3291901 yarn      20   0 7763072   1.6g  14912 S   6.2   0.4   3027:36 java
4002263 root      20   0 7263168   4.4g  16832 S   6.2   1.2   5859:55 java
      1 root      20   0  181888  19136  10624 S   0.0   0.0  13:50.34 systemd
      2 root      20   0       0      0      0 S   0.0   0.0   0:19.21 kthreadd
      3 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 rcu_gp
      4 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 rcu_par_gp
      6 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 kworker/0:0H-events_highpri
      8 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 mm_percpu_wq
      9 root      20   0       0      0      0 S   0.0   0.0   3:40.28 ksoftirqd/0
     11 root      rt   0       0      0      0 S   0.0   0.0   0:11.21 migration/0
     12 root      rt   0       0      0      0 S   0.0   0.0   0:18.17 watchdog/0
     13 root      20   0       0      0      0 S   0.0   0.0   0:00.00 cpuhp/0
     14 root      20   0       0      0      0 S   0.0   0.0   0:00.00 cpuhp/1
     15 root      rt   0       0      0      0 S   0.0   0.0   0:19.25 watchdog/1
     16 root      rt   0       0      0      0 S   0.0   0.0   0:11.58 migration/1
     17 root      20   0       0      0      0 S   0.0   0.0   3:26.51 ksoftirqd/1
     19 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 kworker/1:0H-events_highpri
     20 root      20   0       0      0      0 S   0.0   0.0   0:00.00 cpuhp/2
     21 root      rt   0       0      0      0 S   0.0   0.0   0:19.18 watchdog/2
     22 root      rt   0       0      0      0 S   0.0   0.0   0:04.86 migration/2
     23 root      20   0       0      0      0 S   0.0   0.0   1:54.07 ksoftirqd/2
     25 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 kworker/2:0H-events_highpri
     26 root      20   0       0      0      0 S   0.0   0.0   0:00.00 cpuhp/3
     27 root      rt   0       0      0      0 S   0.0   0.0   0:18.64 watchdog/3
     28 root      rt   0       0      0      0 S   0.0   0.0   0:04.53 migration/3

# grep -c proc /proc/cpuinfo
48

 iostat
Linux 4.18.0-305.el8.ppc64le ()       11/06/2023      _ppc64le_       (48 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          12.61    0.00    0.64    0.05    0.00   86.70

Device             tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
nvme0n1           3.59         2.05       171.95   27091032 2268469720
dm-0              0.03         0.14         0.36    1840516    4710876
dm-1              0.03         0.58         1.33    7592176   17510144
dm-2              3.28         0.08       116.26    1036872 1533830064
dm-3              0.53         0.00        40.67      16352  536491196
dm-4              0.00         0.07         0.03     927276     458764
dm-5              0.00         0.00         0.00      18380       5276
dm-6              0.00         0.00         0.00      14660       2084
dm-7              0.32         0.32        13.30    4249592  175458336

iostat -d 5 -x
Linux 4.18.0-305.el8.ppc64le ()       11/07/2023      _ppc64le_       (48 CPU)

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
nvme0n1          3.77    7.89    513.41   5294.93     0.02     0.60   0.50   7.05    0.17   22.80   0.18   136.12   671.13   1.22   1.43
dm-0             1.61    0.19    125.55      3.16     0.00     0.00   0.00   0.00    0.11    0.29   0.00    77.75    16.84   1.03   0.19
dm-1             0.01    0.00      0.81      0.00     0.00     0.00   0.00   0.00    0.18    0.00   0.00    65.45     0.00   1.45   0.00
dm-2             0.72    2.95     91.00    295.43     0.00     0.00   0.00   0.00    0.21    0.29   0.00   126.23   100.27   1.67   0.61
dm-3             0.15    0.42      9.58     19.93     0.00     0.00   0.00   0.00    0.14    0.16   0.00    64.72    47.42   3.54   0.20
dm-4             0.40    0.04     47.73      1.11     0.00     0.00   0.00   0.00    0.14    0.16   0.00   119.67    26.24   1.43   0.06
dm-5             0.03    0.00      1.52      0.49     0.00     0.00   0.00   0.00    0.07    5.00   0.00    48.03   108.60   2.17   0.01
dm-6             0.07    0.00    126.99      0.47     0.00     0.00   0.00   0.00    0.63    0.00   0.00  1866.09   297.71   3.87   0.03
dm-7             0.52    1.13     97.19   4969.42     0.00     0.00   0.00   0.00    0.13   14.85   0.02   187.01  4403.33   2.99   0.49

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
nvme0n1          0.20    0.40      1.60     13.60     0.00     0.00   0.00   0.00    0.00    0.00   0.00     8.00    34.00   6.67   0.40
dm-0             0.20    0.00      1.60      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     8.00     0.00   0.00   0.00
dm-1             0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
dm-2             0.00    0.20      0.00      0.80     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     4.00  10.00   0.20
dm-3             0.00    0.20      0.00     12.80     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00    64.00  10.00   0.20
dm-4             0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
dm-5             0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
dm-6             0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
dm-7             0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
nvme0n1          0.00    9.40      0.00    319.20     0.00     2.60   0.00  21.67    0.00    0.09   0.00     0.00    33.96   1.06   1.00
dm-0             0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
dm-1             0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
dm-2             0.00    6.60      0.00    229.60     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00    34.79   0.91   0.60
dm-3             0.00    4.00      0.00     75.20     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00    18.80   1.00   0.40
dm-4             0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
dm-5             0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
dm-6             0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
dm-7             0.00    1.40      0.00     14.40     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00    10.29   2.86   0.40

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
nvme0n1          0.00    2.20      0.00     84.80     0.00     0.20   0.00   8.33    0.00    0.09   0.00     0.00    38.55   2.73   0.60
dm-0             0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
dm-1             0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
dm-2             0.00    2.20      0.00     72.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00    32.73   1.82   0.40
dm-3             0.00    0.20      0.00     12.80     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00    64.00  10.00   0.20
dm-4             0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
dm-5             0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
dm-6             0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
dm-7             0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
nvme0n1          0.00    0.40      0.00      1.60     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     4.00  10.00   0.40
dm-0             0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
dm-1             0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
dm-2             0.00    0.40      0.00      1.60     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     4.00  10.00   0.40
dm-3             0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
dm-4             0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
dm-5             0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
dm-6             0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
dm-7             0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
nvme0n1          0.00    1.00      0.00     40.00     0.00     0.40   0.00  28.57    0.00    0.00   0.00     0.00    40.00   6.00   0.60
dm-0             0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
dm-1             0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
dm-2             0.00    0.40      0.00     13.60     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00    34.00   5.00   0.20
dm-3             0.00    0.20      0.00     12.80     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00    64.00  10.00   0.20
dm-4             0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
dm-5             0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
dm-6             0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
dm-7             0.00    0.80      0.00     13.60     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00    17.00   2.50   0.20

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
nvme0n1          0.00    1.40      0.00     58.40     0.00     0.00   0.00   0.00    0.00    0.14   0.00     0.00    41.71   4.29   0.60
dm-0             0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
dm-1             0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
dm-2             0.00    1.00      0.00     32.80     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00    32.80   4.00   0.40
dm-3             0.00    0.40      0.00     25.60     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00    64.00   5.00   0.20
dm-4             0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
dm-5             0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
dm-6             0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
dm-7             0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
nvme0n1          0.00    0.80      0.00     27.20     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00    34.00   7.50   0.60
dm-0             0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
dm-1             0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
dm-2             0.00    0.20      0.00      0.80     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     4.00  10.00   0.20
dm-3             0.00    0.20      0.00     12.80     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00    64.00  10.00   0.20
dm-4             0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
dm-5             0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
dm-6             0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
dm-7             0.00    0.40      0.00     13.60     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00    34.00   5.00   0.20

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
nvme0n1          0.00    9.00      0.00    300.00     0.00     2.00   0.00  18.18    0.00    0.09   0.00     0.00    33.33   1.11   1.00
dm-0             0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
dm-1             0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
dm-2             0.00    9.00      0.00    264.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00    29.33   0.67   0.60
dm-3             0.00    1.20      0.00     30.40     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00    25.33   3.33   0.40
dm-4             0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
dm-5             0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
dm-6             0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
dm-7             0.00    0.80      0.00      5.60     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     7.00   2.50   0.20

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
nvme0n1          0.00    1.20      0.00     41.60     0.00     0.20   0.00  14.29    0.00    0.00   0.00     0.00    34.67   3.33   0.40
dm-0             0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
dm-1             0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
dm-2             0.00    0.40      0.00     13.60     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00    34.00   5.00   0.20
dm-3             0.00    0.20      0.00     12.80     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00    64.00  10.00   0.20
dm-4             0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
dm-5             0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
dm-6             0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
dm-7             0.00    0.80      0.00     15.20     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00    19.00   2.50   0.20

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
nvme0n1          0.00    2.00      0.00     62.40     0.00     0.80   0.00  28.57    0.00    0.10   0.00     0.00    31.20   3.00   0.60
dm-0             0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
dm-1             0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
dm-2             0.00    1.40      0.00     32.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00    22.86   2.86   0.40
dm-3             0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
dm-4             0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
dm-5             0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
dm-6             0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
dm-7             0.00    1.40      0.00     30.40     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00    21.71   1.43   0.20

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
nvme0n1          0.00    1.40      0.00     63.20     0.00     0.20   0.00  12.50    0.00    0.14   0.00     0.00    45.14   4.29   0.60
dm-0             0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
dm-1             0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
dm-2             0.00    0.80      0.00     39.20     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00    49.00   2.50   0.20
dm-3             0.00    0.20      0.00     12.80     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00    64.00  10.00   0.20
dm-4             0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
dm-5             0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
dm-6             0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
dm-7             0.00    0.60      0.00     11.20     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00    18.67   3.33   0.20

 #lscpu
Architecture:        ppc64le
Byte Order:          Little Endian
CPU(s):              48
On-line CPU(s) list: 0-47
Thread(s) per core:  8
Core(s) per socket:  6
Socket(s):           1
NUMA node(s):        1
Model:               2.0 (pvr 0080 0200)
Model name:          POWER10 (architected), altivec supported
Hypervisor vendor:   pHyp
Virtualization type: para
L1d cache:           32K
L1i cache:           48K
L2 cache:            1024K
L3 cache:            4096K
NUMA node0 CPU(s):   0-47
Physical sockets:    1
Physical chips:      4
Physical cores/chip: 6

# numactl --hardware
available: 1 nodes (0)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
node 0 size: 379675 MB
node 0 free: 278579 MB
node distances:
node   0
  0:  10

# numastat
                           node0
numa_hit                26144191
numa_miss                      0
numa_foreign                   0
interleave_hit           5660366
local_node              26144191
other_node                     0


How can I identify the bottleneck and how can I fix this?


cat /proc/interrupts output - https://pastebin.com/wjqrYVZm 
                                

HItesh (1 rep)

Nov 3, 2023, 03:40 PM • Last activity: Dec 25, 2023, 02:12 PM

0 votes

1 answers

688 views

How should Load Average be calculated on a CPU with Efficiency Cores?

cpu load load-average

I recently received a MacBook Pro with an M1 pro CPU, which has 2 "efficiency" cores and 8 performance cores. When I run htop/btop/top I get a load average of >2 because the process scheduler always assigns certain lower-demand processes to the efficiency cores, which results in those cores always r...

                                  I recently received a MacBook Pro with an M1 pro CPU, which has 2 "efficiency" cores and 8 performance cores. When I run htop/btop/top I get a load average of >2 because the process scheduler always assigns certain lower-demand processes to the efficiency cores, which results in those cores always running at 60 to 100% capacity.

I feel like the 2 efficiency cores reduce the utility of the load average metric, which was already reduced due to multiple cores. Back in the dim, distant past we had single core CPUs that the load average made intuitive sense on. However now we have 2 types of CPU core in a single system, and my most recent phone has 3 different types of core: efficiency, performance, and a single ultra performance core.

How should such a new load average be calculated? Are there any ongoing efforts to redefine a general system-load metric? 

Since efficiency cores are made to run low priority processes, perhaps excluding those from the default metric makes sense? Then divide the remaining load value among the non-efficiency CPUs.

For instance, a load average of 3.4. Subtract 2 for the efficiency cores, 1.4. Then divide by the number of performance cores, 1.4 / 8 = 0.175.

acjca2 (310 rep)

Nov 1, 2023, 03:09 PM • Last activity: Nov 1, 2023, 08:07 PM

0 votes

2 answers

2377 views

Why is load average being reported as 0.00 though system is busy doing work?

top uptime load-average

`uptime`, `top`, `cat /proc/loadavg` all reporting load averages for the last 1/5/15 minutes as `0.00`, but the system is definitely busy doing work. Why is this? Server is running Red Hat Enterprise Linux Server release 6.6, kernel 2.6.32-504.12.2.el6.x86_64. $ uptime 12:13:44 up 73 days, 8 min, 1...

                                  uptime, top, cat /proc/loadavg all reporting load averages for the last 1/5/15 minutes as 0.00, but the system is definitely busy doing work. Why is this? Server is running Red Hat Enterprise Linux Server release 6.6, kernel 2.6.32-504.12.2.el6.x86_64.

    $ uptime
     12:13:44 up 73 days, 8 min,  1 user,  load average: 0.00, 0.00, 0.00

    $ cat /proc/loadavg
     0.00 0.00 0.00 12/2706 39700

    $ top
    top - 12:15:35 up 73 days, 10 min,  1 user,  load average: 0.00, 0.00, 0.00
    Tasks: 572 total,   4 running, 568 sleeping,   0 stopped,   0 zombie
    Cpu(s): 37.1%us,  2.3%sy,  0.0%ni, 42.0%id, 18.0%wa,  0.0%hi,  0.5%si,  0.0%st
    ...

user117452

May 31, 2015, 12:18 AM • Last activity: Jun 26, 2023, 01:04 PM

19 votes

4 answers

30594 views

Why/how does "uptime" show CPU load >1?

uptime load-average

I have a **1 core CPU** installed on my PC. Sometimes, ``uptime`` shows load >1. How is this possible and what does this mean? EDIT: The values go up to ``2.4``

                                  I have a **1 core CPU** installed on my PC. Sometimes, `uptime` shows load >1. How is this possible and what does this mean?

EDIT: The values go up to `2.4`

Frantisek (415 rep)

Mar 4, 2014, 08:20 PM • Last activity: Mar 4, 2023, 05:32 PM

0 votes

1 answers

87 views

Where can I find the loadavg.c file in Ubuntu?

ubuntu load-average

Ubuntu version: Linux version 5.15.0-52-generic (buildd@lcy02-amd64-032) (gcc (Ubuntu 11.2.0-19ubuntu1) 11.2.0, GNU ld (GNU Binutils for Ubuntu) 2.38) Other places say it can be found as fs/proc/loadavg.c but I don't have it.\ Where can I find the loadavg.c file?

                                  Ubuntu version: Linux version 5.15.0-52-generic (buildd@lcy02-amd64-032) (gcc (Ubuntu 11.2.0-19ubuntu1) 11.2.0, GNU ld (GNU Binutils for Ubuntu) 2.38)

Other places say it can be found as fs/proc/loadavg.c but I don't have it.\
Where can I find the loadavg.c file?
                                

john_smith (3 rep)

Dec 26, 2022, 07:30 AM • Last activity: Dec 26, 2022, 09:05 AM

4 votes

1 answers

859 views

Does high Load average cgroups give "wrong" overall load average

cgroups load-average

Assume you have a system with 2 processors. Now create a cgroup and configure this group to use only 1 processor. Populate it with enough processes to give it a load average of 5 (to prove a point); it is now hopelessly slow. I am assuming that the load average in `/proc/loadavg` will then also be 5...

                                  Assume you have a system with 2 processors.  Now create a cgroup and configure this group to use only 1 processor.  Populate it with enough processes to give it a load average of 5 (to prove a point); it is now hopelessly slow.

I am assuming that the load average in /proc/loadavg will then also be 5, even though a different user is free to use the other CPU with no wait time.

Is this correct? Is there a source I could quote for this?

trevore (149 rep)

Feb 2, 2016, 03:18 PM • Last activity: Oct 17, 2022, 10:46 AM

4 votes

2 answers

2395 views

CPU and Load Average Conflict on EC2 server

cpu top amazon-ec2 uptime load-average

I am having touble understanding what server resource is causing lag in my Java game server. In the last patch of my game server, I updated my EC2 lamp server from **apache2.2, php5.3, mysql5.5** to **apache2.4, php7.0, mysql5.6**. I also updated my game itself, to include many more instances of mon...

                                  I am having touble understanding what server resource is causing lag in my Java game server.  In the last patch of my game server, I updated my EC2 lamp server from **apache2.2, php5.3, mysql5.5** to **apache2.4, php7.0, mysql5.6**.  I also updated my game itself, to include many more instances of monsters that are looped though every game loop - among other things.

Here is output from right when my game server starts up:

Here is output from a few minutes later:

And here is output from the next morning:

As you can see in the images the cpu usage of my Java process levels off around 80% in the last screenshot, yet load avg goes to 1.20.  I have even seen it go as high as 2.7 this morning.  The cpu credits affect how much actual cpu juice my server has so it makes sense that the percentage goes up as my credits balance diminishes, but why at 80% does my server lag?

On my Amazon EC2 metrics I see cpu at 10% (which confuses me even more):

Right when I start up my server my mmorpg does not lag at all.  Then as soon as my cpu credits are depleted it starts to lag.  This makes me feel like it is cpu based, but when I see 10% and 80% I don't see why.  Any help would be greatly appreciated.  I am on a T2.micro instance, so it has 1 vCPU.  If I go up to the next instance it nearly doubles in price, and stays at same vCPU of 1, but with more credits.

Long story short, I want to understand fully what I going on as the 80% number is throwing me.  I don't just want to throw money at the problem.

KisnardOnline (143 rep)

Feb 25, 2017, 03:46 PM • Last activity: Aug 9, 2022, 06:10 AM

1 votes

1 answers

1001 views

Understanding load average on multicore system with a multithreaded app

linux monitoring multithreading load-average

We have a curious situation regarding load average on our system. It's running an application called ZAG that is idle most of the day. But every 80 minutes or so, it has some sort of activity burst of 5-15 minutes' length. At the time of the burst, load average can climb to 60, 70, 80, 100 or more....

                                  We have a curious situation regarding load average on our system. It's running an application called ZAG that is idle most of the day. But every 80 minutes or so, it has some sort of activity burst of 5-15 minutes' length. At the time of the burst, load average can climb to 60, 70, 80, 100 or more. What's interesting is this: during these high bursts, we'll see the CPU utilization percentages in htop show only 10-20% utilization per CPU. Furthermore, a script that I have written shows light CPU usage; during idle time this:

    ps -eTo psr,user,pid,tid,cputime,class,rtprio,ni,pri,pcpu,stat,wchan:14,args | grep  ZAG | awk '{sum += $10} END{print sum;}'

returns perhaps 535.0... that is, adding up all the CPU percentages from my ZAG application return 535.0% of a CPU, or 5.35/32 or 16.7% utilization of all the CPUs on my system. So in short, not a single CPU is being driven anywhere close to 100% which we expect during mostly idle times.

During the incident, the result comes out to about 538.0 percent... just a little bump higher. I also see more threads on the run queue, as shown by

    while true; do ps -eTo psr,user,pid,tid,cputime,class,rtprio,ni,pri,pcpu,stat,wchan:14,args | grep ZAG | grep ' Rl' | wc -l; sleep 0.5; done

So CPU utilization goes up, little, and there are more threads running. But not using much more CPU, it seems, even as the load average shoots up. There is consistently very little going on regarding disk i/o or network i/o; sar data shows no discernable increase during this burst. There is no increase in memory utilization, and the number of processes may increase by a couple out of some 1700 total processes on the system, but that's all. There is nothing in cron that takes place at these times. htop or top output shows that there is some certainly CPU utilization taking place at this time, mostly user CPU (less than 5% system CPU reported by top). So it doesn't look like anything is waiting on data.

I don't notice anything extraordinary in /proc/interrupts. Rescheduling interrupts seem to be heavy there, but I spot checked half a dozen cores, both even and odd NUMA nodes, and they appear to be stead at about 1400 per second per processor.

This is a 16-core machine with hyperthreads turned on (E5-2667 v4 processor). It has 36 ZAG processes and 729 ZAG therads, as show by ps -ef and ps -eTf, respectively.

So this makes me wonder: How come my CPU utilization percentages are so low, while my load average is so high? Is it because out of my 36 ZAG processes, I have over 700 threads, and that perhaps a thread that's in sched_yield() is still in the run queue, but not accumulating CPU? Or is sched_yield() no longer runnable, but in an noninterruptible state (see below)?

According to Brendan Gregg at https://www.brendangregg.com/blog/2017-08-08/linux-load-averages.html , "When load averages first appeared in Linux, they reflected CPU demand, as with other operating systems. But later on Linux changed them to include not only runnable tasks, but also tasks in the uninterruptible state (TASK_UNINTERRUPTIBLE or nr_uninterruptible). This state is used by code paths that want to avoid interruptions by signals, which includes tasks blocked on disk I/O and some locks. ... But don't Linux load averages sometimes go too high, more than can be explained by disk I/O? Yes, although my guess is that this is due to a new code path using TASK_UNINTERRUPTIBLE that didn't exist in 1993. In Linux 0.99.14, there were 13 codepaths that directly set TASK_UNINTERRUPTIBLE or TASK_SWAPPING (the swapping state was later removed from Linux). Nowadays, in Linux 4.12, there are nearly 400 codepaths that set TASK_UNINTERRUPTIBLE, including some lock primitives. It's possible that one of these codepaths should not be included in the load averages...."

Mike S (2732 rep)

Jan 13, 2022, 09:58 PM • Last activity: Jan 14, 2022, 11:09 PM

0 votes

1 answers

1340 views

CPU LOAD AVRG + how to deal process with D state

rhel cpu ps load-average hdfs

we can see from our `RHEL 7.6` server ( kernel version - `3.10.0-957.el7.x86_64` ) that following process are with `D` state ( they runs from `HDFS` user ) Note - *D state code means that process is in uninterruptible sleep* ps -eo s,user,cmd | grep ^[RD] D hdfs du -sk /grid/sdj/hadoop/hdfs/data/cur...

                                  we can see from our RHEL 7.6 server ( kernel version - 3.10.0-957.el7.x86_64 ) that following process are with D state ( they runs from HDFS user )

Note - *D state code means that process is in uninterruptible sleep*

    ps -eo s,user,cmd | grep ^[RD]
    D hdfs     du -sk /grid/sdj/hadoop/hdfs/data/current/BP-1018134753-10.3.6.170-1530088122990
    D hdfs     du -sk /grid/sdm/hadoop/hdfs/data/current/BP-1018134753-10.3.6.170-1530088122990
    R root     ps -eo s,user,cmd



note's - the disks sdj and sdm are 3T byte size , also "du -sk" happens on other disks as sdd ,  sdf etc
and the disks are with ext4 file-system 

we are suspect that the fact that we have high CPU load avrg is because the "du -sk" that actually run on the disks

so I was thinking what we can do regarding to below behavior

one option is maybe to disable the "du -sk" verification from HDFS , but no clue how to do that

second option is to think what actually cause the D state ? 

I don't sure ... but maybe upgrade the kernel version will help to avoid D state? or else?  ( like disable the CPU Thread(s) ) , etc ?



more details

    lscpu
    Architecture: x86_64
    CPU op-mode(s): 32-bit, 64-bit
    Byte Order: Little Endian
    CPU(s): 48
    On-line CPU(s) list: 0-47
    Thread(s) per core: 2
    Core(s) per socket: 12
    Socket(s): 2
    NUMA node(s): 2
    Vendor ID: GenuineIntel
    CPU family: 6

and CPU LOAD AVRG is around ~ 42-45  ( for 15min avrg )

Reference :

https://community.cloudera.com/t5/Support-Questions/Does-hadoop-run-dfs-du-automatically-when-a-new-job-starts/td-p/231297 

https://community.cloudera.com/t5/Support-Questions/Can-hdfs-dfsadmin-and-hdfs-dsfs-du-be-taxing-on-my-cluster/m-p/182402 

https://community.pivotal.io/s/article/Dealing-with-Processes-in-State-D---Uninterruptible-Sleep-Usually-IO?language=en_US 

https://www.golinuxhub.com/2018/05/how-to-disable-or-enable-hyper/ 
                                

yael (13936 rep)

Nov 28, 2021, 02:16 PM • Last activity: Nov 28, 2021, 02:43 PM

1 votes

0 answers

1098 views

why docker container consume huge CPU load average

rhel docker cpu load-average docker-compose

I want to discuss about strange behavior on our rhel 7.6 server We installed Kafka exporter with container on the server – kafka-01 machine ( total CPU on machine are 12 ) The following `yml` file described the Kafka exporter container configuration more docker.kafka-exporter.yml --- version: '2.4'...

                                  I want to discuss about strange behavior on our rhel 7.6 server

We installed Kafka exporter with container on the server – kafka-01 machine ( total CPU on machine are 12 )

The following yml file described the Kafka exporter container  configuration


    more  docker.kafka-exporter.yml
    ---
    version: '2.4'
    services:
      kafka-exporter:
        mem_limit: "612m"
        image: kafka-exporter:v1.2.0
        restart: always
        network_mode: host
        container_name: kafka-exporter
        command: ["--kafka.server=kafka01.sys65.com:6667"]ump 
        ports:
          - 9308:9308
        logging:
          driver: "json-file"
          options:
            max-size: "15m"
            max-file: "1"

so when we start the container with dockr-compose ,  as docker-compose -f docker.kafka-exporter.yml  up -d
  
we notices that CPU load average jump from ~2-3 to 30-40 after 1-2 hours 

and only restart of machine return the normal CPU load average to normal ( around 1 - 2 )  , but again CPU jump each time we start the docker compose , ( even stopping the docker compose not decrease the CPU load average  )


can someone gives some hint what could be the reason for that strange behavior ?

regarding our case is it useful to consider to install the https://github.com/draios/sysdig  , for investigation ?

Notes:

We verified the CPU load avrg from uptime linux command 

*sometimes machine became **freeze** or **HANG** so we cant access the machine , and only reboot help to return machine again to normal* 


                                

yael (13936 rep)

Nov 2, 2021, 06:50 AM

3 votes

5 answers

12929 views

Get per-core CPU load in shell script

linux shell-script proc top load-average

I need to report the CPU load per core as a percentage from a shell script, but **I cannot run e.g. mpstat for one second**. Basically I think that the infos `top` is showing after pressing `1` is what I want, but I cannot configure top to show this in batch mode (at least I don't know how). I could...

                                  I need to report the CPU load per core as a percentage from a shell script, but **I cannot run e.g. mpstat for one second**. Basically I think that the infos top is showing after pressing 1 is what I want, but I cannot configure top to show this in batch mode (at least I don't know how). I could create a ~/.toprc file with the configuration, but then I have to hope that the users do not mess with it.

I looked at mpstat and parse the output, but this supports only seconds as interval time. My script gets called via SNMP and waiting 1s for the response will generate a timeout, so this is not an option.

Are there other ways to get the per-core cpu load? I read about parsing /proc/stat, but I think this is more a last resort.

Jens (151 rep)

Sep 29, 2016, 11:56 AM • Last activity: Sep 14, 2021, 10:49 AM

0 votes

1 answers

66 views

What is the best way to understand of the efficiency of my program other than load average?

linux performance cpu-usage load-average

Consider the following scenarios: 1. I have only one process running on the machine and from resources like top gives 100% CPU usage which is good. I'm efficiently using the CPU. 2. I have two processes each of them taking 50% CPU. I'm still using the CPU efficiently as the total is hitting 100%. 3....

                                  Consider the following scenarios:

 1. I have only one process running on the machine and from resources like top gives 100% CPU usage which is good. I'm efficiently using the CPU.
 2. I have two processes each of them taking 50% CPU. I'm still using the CPU efficiently as the total is hitting 100%.
 3. I have N (relatively large number) processes running on the machine. Since the CPU is busy. My process may not hit 100% CPU still makes sense as the processor is busy too.
 4. Now let's say there is only one process on the machine and still the CPU usage doesn't hit 100% CPU. Assume the cause is due to a bad program (too much IO or the program is simply doing nothing).

How do I detect case 4? The load average is not a good metric because it takes the average at different times.

Is there any metric or method that I can use to quantify how efficiently my program is using the CPU both under no-load conditions and fully loaded conditions?
                                

vijaychsk (1 rep)

Aug 12, 2021, 09:51 PM • Last activity: Aug 12, 2021, 10:17 PM

1 votes

2 answers

200 views

High program load, when killed program process, Linux doesn't go back to 0.5 normal load. Why?

linux process load-average

I ran a program that reached CPU load of 39.99, obviously more than my 4 core CPU can handle but why when I killed the program, (which is killed), the CPU load doesn't drop to 0.50 when I didn't turn the program on? Also, I noticed that CPU load doesn't go down to 0.5 like instantly after a program...

                                  I ran a program that reached CPU load of 39.99, obviously more than my 4 core CPU can handle but why when I killed the program, (which is killed), the CPU load doesn't drop to 0.50 when I didn't turn the program on?

Also, I noticed that CPU load doesn't go down to 0.5 like instantly after a program is killed, you need to wait for it to go down slowly. Why is that?

Okit Tfseven (13 rep)

Feb 9, 2021, 01:31 AM • Last activity: Feb 14, 2021, 04:08 PM

0 votes

3 answers

2033 views

Process with 1% CPU usage causing load average of 1.5

performance cpu-usage load-average

We recently observed a high load average of about 1.5 on our embedded system, even though pretty much all processes are supposedly sleeping (according to `htop`). The system in question is a dual-core Cortex-A9 running a realtime Linux kernel (4.14.126) built using buildroot. We are using initramfs...

                                  We recently observed a high load average of about 1.5 on our embedded system, even though pretty much all processes are supposedly sleeping (according to htop).

The system in question is a dual-core Cortex-A9 running a realtime Linux kernel (4.14.126) built using buildroot.
We are using initramfs for our root filesystem and there is no swap, so there is definitely **no disk I/O** during normal operation.

After a bit of digging, we found out that the load is caused by a program called [swupdate](https://sbabic.github.io/swupdate/swupdate.html) , which provides us with a convenient web interface for software updates (and we would very much like to continue using that).

When i use time to estimate the average cpu-usage of that application (by calculating *(user+sys)/real*), i get a value of only about 1%, which doesn't make much sense considering the load average of 1.5.

I know that the load average also includes processes in the TASK_UNINTERRUPTIBLE state, which don't contribute to cpu usage.
What i don't understand is why any of the threads/processes of that application would ever be in that state.

To further analyze the situation i have captured a kernel trace using [lttng](https://lttng.org/) , which shows that the only thing swupdate does is this (every 50ms):

and this (every 100ms):

As you can see, there's a little bit of what appears to be socket-based IPC, and a select waiting for *something*.
In the IPC case, one thread appears to be mostly blocking in nanosleep(), while the other is blocking in accept(), neither of which should consume any system resources as far as i'm aware.

FYI: the time base for both screenshots is the same, and the IPC takes approx. 500-600µs in total (which, considering the interval of 50ms, fits quite nicely with the observed 1% CPU usage)

So, what is causing the load here?

Felix G (111 rep)

Aug 11, 2020, 10:59 AM • Last activity: Feb 12, 2021, 08:15 PM

4 votes

1 answers

2638 views

Understand load average on multicore system

load-average

For only one microprocessor unit, the load average output by `top` could be understood that if it's above 1.0 then there are jobs waiting. But if we have n number of cores on a multicore system with `l*n` logical cores (on my Intel CPU n=6 and `l*n` = 12 so the output from `nproc` is 12), should we...

                                  For only one microprocessor unit, the load average output by top could be understood that if it's above 1.0 then there are jobs waiting. But if we have n number of cores on a multicore system with l*n logical cores (on my Intel CPU n=6 and l*n = 12 so the output from nproc is 12), should we divide the load average by the output from nproc to see if that number is above 1 to understand if there are (on average) jobs waiting, or is it better to use htop to understand if a parallel multicore system is getting too much average load?

 I think that my method was wrong but the conclusion was right when I saw that an average load was above 10 top, I checked with ps which process was expensive and found an overflow from a running program, but if that machine actually has output from nproc > 10 then it would not really have been cause for investigation if I had known that. Do you agree?

Niklas Rosencrantz (4324 rep)

Dec 15, 2020, 12:26 PM • Last activity: Dec 15, 2020, 04:27 PM

Showing page 1 of 20 total questions