Unix & Linux Stack Exchange
Q&A for users of Linux, FreeBSD and other Unix-like operating systems
Latest Questions
4
votes
3
answers
3078
views
Check what process is spiking the average load with atop
In trying to find the culprit of a high load average on a system during night (which does not seem to be related to logrotate) I installed atop to write a raw file with a specific interval. While reading the file, it seems the processlist stands still, can I somehow go back and forth between the sam...
In trying to find the culprit of a high load average on a system during night (which does not seem to be related to logrotate) I installed atop to write a raw file with a specific interval. While reading the file, it seems the processlist stands still, can I somehow go back and forth between the samples to see what sticks out, and further sort by any column (like cpu usage)?
user135361
(193 rep)
Sep 3, 2013, 01:31 PM
• Last activity: Jun 4, 2025, 01:06 AM
3
votes
1
answers
1997
views
Unusual high load average (due to peak I/O wait? irqs?)
I have a problem with high load average (`~2`) on my (personal laptop) computer for a long time now. I am running Arch Linux. If I remember correctly, the problem started with a certain kernel update, initially I thought it was related to [this bug][1]. The problem was not solved though, when the bu...
I have a problem with high load average (
~2
) on my (personal laptop) computer for a long time now. I am running Arch Linux. If I remember correctly, the problem started with a certain kernel update, initially I thought it was related to this bug . The problem was not solved though, when the bug was fixed. I did not really care as I thought it is still a bug, because the performance did not seem to suffer. What made me curious is that, recently, I had a moment of super low load average (~0
) while idling. After a reboot, everything went back to "normal", with high load average. So I started investigating:
% uptime
14:31:04 up 2:22, 1 user, load average: 1.96, 1.98, 1.99
So far nothing new. Then I tried top:
% top -b -n 1
top - 14:33:52 up 2:25, 1 user, load average: 2.02, 2.07, 2.02
Tasks: 146 total, 2 running, 144 sleeping, 0 stopped, 0 zombie
%Cpu0 : 2.6/0.9 3[|||| ]
%Cpu1 : 2.7/0.9 4[|||| ]
%Cpu2 : 2.7/1.0 4[|||| ]
%Cpu3 : 2.7/0.8 3[|||| ]
GiB Mem :228125107552256.0/7.712 [
GiB Swap: 0.0/7.904 [ ]
PID USER PR NI VIRT RES %CPU %MEM TIME+ S COMMAND
2 root 20 0 0.0m 0.0m 0.0 0.0 0:00.00 S kthreadd
404 root 20 0 0.0m 0.0m 0.0 0.0 0:01.09 D `- rtsx_usb_ms_2
1854 root 20 0 0.0m 0.0m 0.0 0.0 0:06.03 D `- kworker/0:2
I cut out all the processes and kernel threads except those two. Here we can see already some suspicious kernel threads (state D). And some suspicious Mem value (see edit)...
Looking at CPU:
% mpstat
Linux 4.13.12-1-ARCH (arch) 30.11.2017 _x86_64_ (4 CPU)
14:36:09 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
14:36:09 all 2.66 0.00 0.88 1.56 0.00 0.01 0.00 0.00 0.00 94.90
% sar -u 1 30
Linux 4.13.12-1-ARCH (arch) 30.11.2017 _x86_64_ (4 CPU)
14:37:04 CPU %user %nice %system %iowait %steal %idle
14:37:05 all 1.00 0.00 0.75 0.00 0.00 98.25
14:37:06 all 1.76 0.00 0.50 0.00 0.00 97.74
14:37:07 all 1.00 0.00 0.25 0.00 0.00 98.75
14:37:08 all 0.50 0.00 0.50 0.00 0.00 99.00
14:37:09 all 0.50 0.00 0.50 0.25 0.00 98.75
14:37:10 all 0.50 0.00 0.50 6.03 0.00 92.96
14:37:11 all 0.75 0.00 0.50 11.75 0.00 87.00
14:37:12 all 0.50 0.00 0.25 0.00 0.00 99.25
[ . . . ]
14:37:21 all 1.26 0.00 0.76 0.00 0.00 97.98
14:37:22 all 0.75 0.00 0.25 2.26 0.00 96.73
14:37:23 all 0.50 0.00 0.50 16.83 0.00 82.16
14:37:24 all 0.75 0.00 0.50 0.00 0.00 98.74
14:37:25 all 0.50 0.00 0.50 0.00 0.00 98.99
14:37:26 all 0.76 0.00 0.50 7.56 0.00 91.18
14:37:27 all 0.25 0.00 0.51 0.00 0.00 99.24
14:37:28 all 1.00 0.00 0.75 0.25 0.00 98.00
14:37:29 all 0.25 0.00 0.76 0.00 0.00 98.99
14:37:30 all 0.75 0.00 0.50 0.00 0.00 98.74
14:37:31 all 0.75 0.00 0.50 3.27 0.00 95.48
14:37:32 all 0.51 0.00 0.51 13.16 0.00 85.82
14:37:33 all 0.75 0.00 0.50 0.25 0.00 98.49
14:37:34 all 1.26 0.00 0.75 0.00 0.00 97.99
Average: all 0.71 0.00 0.56 2.06 0.00 96.67
reveals some peaks in I/O wait. The best guess so far. Looking closer:
% iostat -x 1 30
Linux 4.13.12-1-ARCH (arch) 30.11.2017 _x86_64_ (4 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
2.60 0.00 0.87 1.55 0.00 94.98
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.93 3.00 3.71 1.94 95.04 102.27 69.91 0.60 106.78 16.56 279.32 14.47 8.17
avg-cpu: %user %nice %system %iowait %steal %idle
0.75 0.00 0.75 0.25 0.00 98.25
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.01 13.00 0.00 13.00 10.00 1.00
[ . . . ]
avg-cpu: %user %nice %system %iowait %steal %idle
0.50 0.00 0.50 17.04 0.00 81.95
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 8.00 0.00 2.00 0.00 40.00 40.00 0.69 346.50 0.00 346.50 346.50 69.30
[ . . . ]
avg-cpu: %user %nice %system %iowait %steal %idle
0.25 0.00 0.50 7.29 0.00 91.96
[ . . . ]
avg-cpu: %user %nice %system %iowait %steal %idle
1.00 0.00 0.75 16.96 0.00 81.30
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 5.00 0.00 2.00 0.00 28.00 28.00 0.71 357.00 0.00 357.00 356.50 71.30
[ . . . ]
avg-cpu: %user %nice %system %iowait %steal %idle
0.50 0.00 0.50 0.00 0.00 99.00
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Looking at processes with uninterruptable sleep:
% for x in seq 1 1 10
; do ps -eo state,pid,cmd | grep "^D"; echo "----"; sleep 5; done
D 404 [rtsx_usb_ms_2]
D 1854 [kworker/0:2]
D 2877 [kworker/0:0]
----
D 404 [rtsx_usb_ms_2]
D 1854 [kworker/0:2]
D 2877 [kworker/0:0]
----
D 404 [rtsx_usb_ms_2]
D 1854 [kworker/0:2]
D 2877 [kworker/0:0]
----
D 404 [rtsx_usb_ms_2]
D 2877 [kworker/0:0]
----
D 404 [rtsx_usb_ms_2]
----
D 404 [rtsx_usb_ms_2]
D 1854 [kworker/0:2]
D 2877 [kworker/0:0]
----
D 404 [rtsx_usb_ms_2]
D 2877 [kworker/0:0]
----
D 404 [rtsx_usb_ms_2]
D 2877 [kworker/0:0]
----
D 404 [rtsx_usb_ms_2]
D 1854 [kworker/0:2]
D 2877 [kworker/0:0]
----
D 404 [rtsx_usb_ms_2]
D 3177 [kworker/u32:4]
----
and last thing I did:
% vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 1 0 5010040 123612 1220080 0 0 23 25 111 433 3 1 95 2 0
0 0 0 5006256 123612 1224164 0 0 0 96 186 839 1 1 97 1 0
1 0 0 5006132 123612 1224164 0 0 0 0 175 714 1 0 99 0 0
0 0 0 5003156 123612 1224156 0 0 0 0 234 1009 2 1 98 0 0
0 0 0 5003156 123612 1224156 0 0 0 0 161 680 0 0 99 0 0
0 1 0 5003156 123616 1224156 0 0 0 60 214 786 1 1 94 5 0
0 0 0 5003280 123620 1224156 0 0 0 4 226 776 1 0 88 11 0
1 0 0 5003156 123620 1224156 0 0 0 0 210 733 1 0 99 0 0
0 0 0 5005388 123620 1224156 0 0 0 0 159 747 1 0 99 0 0
0 0 0 5005388 123620 1224156 0 0 0 0 233 803 1 0 99 0 0
0 0 0 5005512 123620 1224156 0 0 0 0 152 670 1 0 99 0 0
0 0 0 5009664 123620 1220060 0 0 0 0 240 914 1 1 99 0 0
0 0 0 5009540 123620 1220060 0 0 0 0 237 833 1 1 99 0 0
0 0 0 5009664 123620 1220060 0 0 0 0 166 999 1 1 99 0 0
0 1 0 5009664 123620 1220060 0 0 0 4 168 700 1 0 88 11 0
0 0 0 5009540 123628 1220060 0 0 0 12 207 778 1 1 91 8 0
0 0 0 5009788 123628 1220064 0 0 0 0 189 717 0 1 99 0 0
0 0 0 5009664 123628 1220064 0 0 0 0 243 1453 1 1 98 0 0
0 0 0 5009044 123628 1220576 0 0 0 0 166 708 1 0 99 0 0
0 0 0 5009168 123628 1220576 0 0 0 0 146 663 1 0 99 0 0
0 0 0 5009540 123628 1220064 0 0 0 0 175 705 1 1 99 0 0
0 1 0 5009292 123632 1220128 0 0 0 8 223 908 1 0 99 0 0
^C
Now I still don't know what the problem is, but it looks like it comes from some peak I/O operations. There are some suspicious kernel threads. Any further ideas? What else could I do to investigate?
**edit:** The Mem value seems strange, but it just occured very recently, a week ago or so, everything seemed to be normal. And
% free
total used free shared buff/cache available
Mem: 8086240 1913860 4824764 133880 1347616 6231856
Swap: 8288252 0 8288252
seems to be fine though.
**edit2:** First results of testing sar monitoring my system (very frequently, intervals of 1 second, but for a short duration, to get the peaks):
Linux 4.13.12-1-ARCH (arch) 01.12.2017 _x86_64_ (4 CPU)
12:36:25 CPU %user %nice %system %iowait %steal %idle
12:36:26 all 0.50 0.00 0.50 0.00 0.00 99.00
12:36:27 all 0.50 0.00 0.50 0.25 0.00 98.74
12:36:28 all 0.50 0.00 0.75 0.00 0.00 98.75
12:36:29 all 0.50 0.00 0.25 7.52 0.00 91.73
12:36:30 all 0.25 0.00 0.75 9.77 0.00 89.22
12:36:31 all 0.25 0.00 0.75 0.00 0.00 98.99
12:36:32 all 1.00 0.00 0.50 0.25 0.00 98.25
12:36:33 all 1.00 0.00 1.00 0.00 0.00 98.00
12:36:34 all 0.25 0.00 0.25 0.25 0.00 99.24
12:36:35 all 0.50 0.25 0.75 33.25 0.00 65.25
12:36:36 all 0.50 0.00 0.75 0.25 0.00 98.50
12:36:37 all 0.75 0.00 0.25 0.00 0.00 99.00
12:36:38 all 0.25 0.00 0.50 0.00 0.00 99.24
12:36:39 all 0.50 0.00 0.50 0.00 0.00 99.00
12:36:40 all 0.50 0.25 0.50 10.75 0.00 88.00
Average: all 0.52 0.03 0.57 4.16 0.00 94.72
Network (-n
) seems to be alright. Looking at devices (-d
) reveals:
Linux 4.13.12-1-ARCH (arch) 01.12.2017 _x86_64_ (4 CPU)
12:36:25 DEV tps rd_sec/s wr_sec/s avgrq-sz avgqu-sz await svctm %util
12:36:26 dev8-0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
12:36:26 dev8-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
[ . . . ]
12:36:29 dev8-7 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
12:36:30 dev8-0 2.00 0.00 88.00 44.00 0.41 355.00 207.00 41.40
12:36:30 dev8-1 2.00 0.00 88.00 44.00 0.41 355.00 207.00 41.40
12:36:30 dev8-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
12:36:30 dev8-3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
12:36:30 dev8-4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
12:36:30 dev8-5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
12:36:30 dev8-6 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
12:36:30 dev8-7 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
12:36:31 dev8-0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
12:36:31 dev8-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
[ . . . ]
12:36:34 dev8-7 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
12:36:35 dev8-0 2.00 0.00 24.00 12.00 0.70 348.50 348.00 69.60
12:36:35 dev8-1 2.00 0.00 24.00 12.00 0.70 348.50 348.00 69.60
12:36:35 dev8-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
12:36:35 dev8-3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
12:36:35 dev8-4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
12:36:35 dev8-5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
12:36:35 dev8-6 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
12:36:35 dev8-7 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
12:36:36 dev8-0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
12:36:36 dev8-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
[ . . . ]
12:36:40 dev8-7 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Average: dev8-0 0.27 0.00 7.47 28.00 0.12 351.75 455.75 12.15
Average: dev8-1 0.27 0.00 7.47 28.00 0.12 351.75 455.75 12.15
Average: dev8-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Average: dev8-3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Average: dev8-4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Average: dev8-5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Average: dev8-6 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Average: dev8-7 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
and -b
gives:
Linux 4.13.12-1-ARCH (arch) 01.12.2017 _x86_64_ (4 CPU)
12:36:25 tps rtps wtps bread/s bwrtn/s
12:36:26 0.00 0.00 0.00 0.00 0.00
12:36:27 0.00 0.00 0.00 0.00 0.00
12:36:28 0.00 0.00 0.00 0.00 0.00
12:36:29 0.00 0.00 0.00 0.00 0.00
12:36:30 2.00 0.00 2.00 0.00 88.00
12:36:31 0.00 0.00 0.00 0.00 0.00
12:36:32 0.00 0.00 0.00 0.00 0.00
12:36:33 0.00 0.00 0.00 0.00 0.00
12:36:34 0.00 0.00 0.00 0.00 0.00
12:36:35 2.00 0.00 2.00 0.00 24.00
12:36:36 0.00 0.00 0.00 0.00 0.00
12:36:37 0.00 0.00 0.00 0.00 0.00
12:36:38 0.00 0.00 0.00 0.00 0.00
12:36:39 0.00 0.00 0.00 0.00 0.00
12:36:40 0.00 0.00 0.00 0.00 0.00
Average: 0.27 0.00 0.27 0.00 7.47
So I assume the issue seems to be related to my hard drive (?). Because the I/O is on partition 1 (my root partition), it should be somewhere outside of /var
which has an extra partition. The other partitions are data partitions and not system related.
**edit3:** Even more data to that specific peak: paging looks fine (from my perspective with limited knowledge)
Linux 4.13.12-1-ARCH (arch) 01.12.2017 _x86_64_ (4 CPU)
12:36:25 pgpgin/s pgpgout/s fault/s majflt/s pgfree/s pgscank/s pgscand/s pgsteal/s %vmeff
12:36:26 0.00 0.00 0.00 0.00 2233.00 0.00 0.00 0.00 0.00
12:36:27 0.00 0.00 0.00 0.00 88.00 0.00 0.00 0.00 0.00
12:36:28 0.00 0.00 766.00 0.00 185.00 0.00 0.00 0.00 0.00
12:36:29 0.00 40.00 0.00 0.00 47.00 0.00 0.00 0.00 0.00
12:36:30 0.00 4.00 0.00 0.00 45.00 0.00 0.00 0.00 0.00
12:36:31 0.00 0.00 1.00 0.00 46.00 0.00 0.00 0.00 0.00
12:36:32 0.00 0.00 5.00 0.00 560.00 0.00 0.00 0.00 0.00
12:36:33 0.00 0.00 2.00 0.00 85.00 0.00 0.00 0.00 0.00
12:36:34 0.00 0.00 2.00 0.00 47.00 0.00 0.00 0.00 0.00
12:36:35 0.00 12.00 0.00 0.00 44.00 0.00 0.00 0.00 0.00
12:36:36 0.00 0.00 0.00 0.00 47.00 0.00 0.00 0.00 0.00
12:36:37 0.00 0.00 2.00 0.00 45.00 0.00 0.00 0.00 0.00
12:36:38 0.00 0.00 0.00 0.00 47.00 0.00 0.00 0.00 0.00
12:36:39 0.00 0.00 0.00 0.00 77.00 0.00 0.00 0.00 0.00
12:36:40 0.00 8.00 0.00 0.00 47.00 0.00 0.00 0.00 0.00
Average: 0.00 4.27 51.87 0.00 242.87 0.00 0.00 0.00 0.00
It looks like files were created during that peak (-v
):
Linux 4.13.12-1-ARCH (arch) 01.12.2017 _x86_64_ (4 CPU)
12:36:25 dentunusd file-nr inode-nr pty-nr
12:36:26 186520 4480 195468 2
[ . . . ]
12:36:34 186520 4480 195468 2
12:36:35 186520 4512 195468 2
[ . . . ]
12:36:40 186520 4512 195468 2
Average: 186520 4493 195468 2
**edit4:** It looks like some irq
's are responsible. Running iotop -o -a
(show only processes with i/o and accumulate them, so keep all processes that had i/o since the start of the program) resulted in:
Total DISK READ : 0.00 B/s | Total DISK WRITE : 0.00 B/s
Actual DISK READ: 0.00 B/s | Actual DISK WRITE: 0.00 B/s
TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND
7 be/4 root 0.00 B 0.00 B 0.00 % 99.99 % [ksoftirqd/0]
17 be/4 root 0.00 B 0.00 B 0.00 % 99.99 % [ksoftirqd/1]
23 be/4 root 0.00 B 0.00 B 0.00 % 99.99 % [ksoftirqd/2]
29 be/4 root 0.00 B 0.00 B 0.00 % 99.99 % [ksoftirqd/3]
292 rt/4 root 0.00 B 0.00 B 0.00 % 99.99 % [i915/signal:0]
[ . . . ]
So, is this a thing? How could I continue...?
nox
(161 rep)
Nov 30, 2017, 02:48 PM
• Last activity: May 15, 2025, 08:06 PM
5
votes
2
answers
18030
views
Difference between the %CPU usage and load average, and when should it be a concern?
I've searched multiple answers here but couldn't find an answer that is related to this scenario, but if you think you do, kindly point me to it. I am including the numbers here for the ease of my own comprehention. I have a 96 core baremetal linux server with 256 GB RAM that's dedicated to run an i...
I've searched multiple answers here but couldn't find an answer that is related to this scenario, but if you think you do, kindly point me to it.
I am including the numbers here for the ease of my own comprehention.
I have a 96 core baremetal linux server with 256 GB RAM that's dedicated to run an in-house written distributed event-based asynchronous network service that acts as a caching server. This daemon runs with 32 worker threads. Apart from the main task of fetching and caching, this server also does a variety of related tasks in a couple of extra separate threads like polling other members' health-checks, writing metrics to a unix socket, etc. The worker threads value isn't bumped further because increasing this will increase the cache lock contention. There is not much disk activity from this server, as the metrics are tried to be written in batches and if the unix socket fails, it just ignores it and frees the memory.
This instance is a part of a 9 node cluster and the stats of this node speaks for the rest of the instances in this cluster.
With the recent surge in inbound traffic, I see the %CPU usage of the process has gone up considerably but the load average is still less than 1.
Please find the stats below.
:~$ nice top
top - 19:51:55 up 95 days, 7:27, 1 user, load average: 0.33, 0.28, 0.32
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
587486 cacher 20 0 107.4g 93.0g 76912 S 17.2 37.0 5038:13 cacher
The %CPU
goes up till 80% some times but the load average is considerably less even then and doesn't go beyond 1.5, and this happens mostly when there's a cache miss and the cacher has to fetch it from an upstream, so it is mostly a set of network activities. As far as I understood, the compute heavy operation this service does at runtime is when it has to compute the hash of the item to be cached when it has to store it into the appropriate distributed buckets. There are no systemd limits set on this service for any parameters, and it is also tuned to disable the kernel's oomkiller for this process, although it is nowhere near the upper limit. The systemd sockets to which this binds has been tuned to accommodate more tx and rx buffer.
* why is the load average less than 1 on a 96 core server when the %CPU
for the service that uses 32 threads fluctuates between 20% and 80% consistently?
* On a 96 core server, how much is considered a safe value for %CPU
to go safely? Does it have a relation with how many threads it is used? If the number of threads are bumped, is a higher %CPU usage theoretically accepted?
Thank you.
init
(151 rep)
Jan 7, 2023, 08:23 PM
• Last activity: Sep 19, 2024, 08:54 AM
-1
votes
1
answers
128
views
How to calculate load average is high or normal?
How to define/find some x.x value is high load by lokkin into uptime in linux server if i've 4 cpus? Is there any formula to calculate? ``` CPU(s): 4 On-line CPU(s) list: 0-3 Vendor ID: GenuineIntel Model name: Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz CPU family: 6 Model: 85 Thread(s) per core...
How to define/find some x.x value is high load by lokkin into uptime in linux server if i've 4 cpus?
Is there any formula to calculate?
CPU(s): 4
On-line CPU(s) list: 0-3
Vendor ID: GenuineIntel
Model name: Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
CPU family: 6
Model: 85
Thread(s) per core: 2
Core(s) per socket: 2
Socket(s): 1
Stepping: 7
$ uptime
14:01:38 up 15 min, 2 users, load average: 7.09, 3.44, 1.96
Current load average is high. But still server is working fine.
$ uptime
14:39:50 up 53 min, 5 users, load average: 31.95, 29.13, 24.25
Output of vmstat
$ vmstat 1 3
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
16 0 0 15798924 2168 189096 0 0 329 25 103 117 62 1 37 0 0
16 0 0 15798924 2168 189096 0 0 0 0 405 433 100 0 0 0 0
16 0 0 15798924 2168 189096 0 0 0 0 408 441 100 0 0 0 0
I've see few link. However i couldn't understand.
Priyanka
(1 rep)
Aug 15, 2024, 02:43 PM
• Last activity: Aug 15, 2024, 05:09 PM
-3
votes
4
answers
12521
views
What is the Unix command-line command to get the system load information?
In Linux or Unix operating system, I am getting the text `System load` as below. Can anyone please tell me what is the meaning for that and how to extract system load % using CLI commands? ``` System load: 6.84 ```
In Linux or Unix operating system, I am getting the text
System load
as below.
Can anyone please tell me what is the meaning for that and how to extract system load % using CLI commands?
System load: 6.84
Dhans
(29 rep)
Nov 21, 2019, 03:17 AM
• Last activity: Mar 14, 2024, 03:50 AM
0
votes
1
answers
718
views
89% cpu is ideal but the load average is extremely high in rhel8.4
I am using RHEL 8.4 and I seem to always have a very high load average, despite my CPU being 89% idle: ``` $ uname -a Linux dx11866-hs 4.18.0-305.el8.ppc64le #1 SMP Thu Apr 29 08:53:15 EDT 2021 ppc64le ppc64le ppc64le GNU/Linux $top top - 19:32:45 up 150 days, 3:45, 1 user, load average: 3936.78, 39...
I am using RHEL 8.4 and I seem to always have a very high load average, despite my CPU being 89% idle:
```
$ uname -a
Linux dx11866-hs 4.18.0-305.el8.ppc64le #1 SMP Thu Apr 29 08:53:15 EDT 2021 ppc64le ppc64le ppc64le GNU/Linux
$top
top - 19:32:45 up 150 days, 3:45, 1 user, load average: 3936.78, 3934.85, 3935.12
Tasks: 819 total, 1 running, 818 sleeping, 0 stopped, 0 zombie
%Cpu(s): 10.6 us, 0.4 sy, 0.0 ni, 89.1 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 377629.6 total, 197139.6 free, 169755.4 used, 10734.7 buff/cache
MiB Swap: 16383.9 total, 12444.2 free, 3939.8 used. 199111.0 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1271217 yarn 20 0 8059136 5.7g 20608 S 318.8 1.6 6716:49 java
999164 yarn 20 0 10.3g 3.4g 117376 S 162.5 0.9 2:43.75 java
997941 yarn 20 0 12.0g 2.1g 71040 S 43.8 0.6 3:28.04 java
10 root 20 0 0 0 0 I 6.2 0.0 90:45.27 rcu_sched
1000002 yarn 20 0 12.0g 761088 65344 S 6.2 0.2 0:12.84 java
1001197 yarn 20 0 12.0g 752704 65344 S 6.2 0.2 0:11.60 java
1001966 root 20 0 17600 8384 4992 R 6.2 0.0 0:00.02 top
3291901 yarn 20 0 7763072 1.6g 14912 S 6.2 0.4 3027:36 java
4002263 root 20 0 7263168 4.4g 16832 S 6.2 1.2 5859:55 java
1 root 20 0 181888 19136 10624 S 0.0 0.0 13:50.34 systemd
2 root 20 0 0 0 0 S 0.0 0.0 0:19.21 kthreadd
3 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 rcu_gp
4 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 rcu_par_gp
6 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 kworker/0:0H-events_highpri
8 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 mm_percpu_wq
9 root 20 0 0 0 0 S 0.0 0.0 3:40.28 ksoftirqd/0
11 root rt 0 0 0 0 S 0.0 0.0 0:11.21 migration/0
12 root rt 0 0 0 0 S 0.0 0.0 0:18.17 watchdog/0
13 root 20 0 0 0 0 S 0.0 0.0 0:00.00 cpuhp/0
14 root 20 0 0 0 0 S 0.0 0.0 0:00.00 cpuhp/1
15 root rt 0 0 0 0 S 0.0 0.0 0:19.25 watchdog/1
16 root rt 0 0 0 0 S 0.0 0.0 0:11.58 migration/1
17 root 20 0 0 0 0 S 0.0 0.0 3:26.51 ksoftirqd/1
19 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 kworker/1:0H-events_highpri
20 root 20 0 0 0 0 S 0.0 0.0 0:00.00 cpuhp/2
21 root rt 0 0 0 0 S 0.0 0.0 0:19.18 watchdog/2
22 root rt 0 0 0 0 S 0.0 0.0 0:04.86 migration/2
23 root 20 0 0 0 0 S 0.0 0.0 1:54.07 ksoftirqd/2
25 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 kworker/2:0H-events_highpri
26 root 20 0 0 0 0 S 0.0 0.0 0:00.00 cpuhp/3
27 root rt 0 0 0 0 S 0.0 0.0 0:18.64 watchdog/3
28 root rt 0 0 0 0 S 0.0 0.0 0:04.53 migration/3
# grep -c proc /proc/cpuinfo
48
iostat
Linux 4.18.0-305.el8.ppc64le () 11/06/2023 _ppc64le_ (48 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
12.61 0.00 0.64 0.05 0.00 86.70
Device tps kB_read/s kB_wrtn/s kB_read kB_wrtn
nvme0n1 3.59 2.05 171.95 27091032 2268469720
dm-0 0.03 0.14 0.36 1840516 4710876
dm-1 0.03 0.58 1.33 7592176 17510144
dm-2 3.28 0.08 116.26 1036872 1533830064
dm-3 0.53 0.00 40.67 16352 536491196
dm-4 0.00 0.07 0.03 927276 458764
dm-5 0.00 0.00 0.00 18380 5276
dm-6 0.00 0.00 0.00 14660 2084
dm-7 0.32 0.32 13.30 4249592 175458336
iostat -d 5 -x
Linux 4.18.0-305.el8.ppc64le () 11/07/2023 _ppc64le_ (48 CPU)
Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
nvme0n1 3.77 7.89 513.41 5294.93 0.02 0.60 0.50 7.05 0.17 22.80 0.18 136.12 671.13 1.22 1.43
dm-0 1.61 0.19 125.55 3.16 0.00 0.00 0.00 0.00 0.11 0.29 0.00 77.75 16.84 1.03 0.19
dm-1 0.01 0.00 0.81 0.00 0.00 0.00 0.00 0.00 0.18 0.00 0.00 65.45 0.00 1.45 0.00
dm-2 0.72 2.95 91.00 295.43 0.00 0.00 0.00 0.00 0.21 0.29 0.00 126.23 100.27 1.67 0.61
dm-3 0.15 0.42 9.58 19.93 0.00 0.00 0.00 0.00 0.14 0.16 0.00 64.72 47.42 3.54 0.20
dm-4 0.40 0.04 47.73 1.11 0.00 0.00 0.00 0.00 0.14 0.16 0.00 119.67 26.24 1.43 0.06
dm-5 0.03 0.00 1.52 0.49 0.00 0.00 0.00 0.00 0.07 5.00 0.00 48.03 108.60 2.17 0.01
dm-6 0.07 0.00 126.99 0.47 0.00 0.00 0.00 0.00 0.63 0.00 0.00 1866.09 297.71 3.87 0.03
dm-7 0.52 1.13 97.19 4969.42 0.00 0.00 0.00 0.00 0.13 14.85 0.02 187.01 4403.33 2.99 0.49
Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
nvme0n1 0.20 0.40 1.60 13.60 0.00 0.00 0.00 0.00 0.00 0.00 0.00 8.00 34.00 6.67 0.40
dm-0 0.20 0.00 1.60 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 8.00 0.00 0.00 0.00
dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-2 0.00 0.20 0.00 0.80 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 4.00 10.00 0.20
dm-3 0.00 0.20 0.00 12.80 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 64.00 10.00 0.20
dm-4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-6 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-7 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
nvme0n1 0.00 9.40 0.00 319.20 0.00 2.60 0.00 21.67 0.00 0.09 0.00 0.00 33.96 1.06 1.00
dm-0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-2 0.00 6.60 0.00 229.60 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 34.79 0.91 0.60
dm-3 0.00 4.00 0.00 75.20 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 18.80 1.00 0.40
dm-4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-6 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-7 0.00 1.40 0.00 14.40 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 10.29 2.86 0.40
Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
nvme0n1 0.00 2.20 0.00 84.80 0.00 0.20 0.00 8.33 0.00 0.09 0.00 0.00 38.55 2.73 0.60
dm-0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-2 0.00 2.20 0.00 72.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 32.73 1.82 0.40
dm-3 0.00 0.20 0.00 12.80 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 64.00 10.00 0.20
dm-4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-6 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-7 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
nvme0n1 0.00 0.40 0.00 1.60 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 4.00 10.00 0.40
dm-0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-2 0.00 0.40 0.00 1.60 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 4.00 10.00 0.40
dm-3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-6 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-7 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
nvme0n1 0.00 1.00 0.00 40.00 0.00 0.40 0.00 28.57 0.00 0.00 0.00 0.00 40.00 6.00 0.60
dm-0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-2 0.00 0.40 0.00 13.60 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 34.00 5.00 0.20
dm-3 0.00 0.20 0.00 12.80 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 64.00 10.00 0.20
dm-4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-6 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-7 0.00 0.80 0.00 13.60 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 17.00 2.50 0.20
Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
nvme0n1 0.00 1.40 0.00 58.40 0.00 0.00 0.00 0.00 0.00 0.14 0.00 0.00 41.71 4.29 0.60
dm-0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-2 0.00 1.00 0.00 32.80 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 32.80 4.00 0.40
dm-3 0.00 0.40 0.00 25.60 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 64.00 5.00 0.20
dm-4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-6 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-7 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
nvme0n1 0.00 0.80 0.00 27.20 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 34.00 7.50 0.60
dm-0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-2 0.00 0.20 0.00 0.80 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 4.00 10.00 0.20
dm-3 0.00 0.20 0.00 12.80 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 64.00 10.00 0.20
dm-4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-6 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-7 0.00 0.40 0.00 13.60 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 34.00 5.00 0.20
Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
nvme0n1 0.00 9.00 0.00 300.00 0.00 2.00 0.00 18.18 0.00 0.09 0.00 0.00 33.33 1.11 1.00
dm-0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-2 0.00 9.00 0.00 264.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 29.33 0.67 0.60
dm-3 0.00 1.20 0.00 30.40 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 25.33 3.33 0.40
dm-4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-6 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-7 0.00 0.80 0.00 5.60 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 7.00 2.50 0.20
Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
nvme0n1 0.00 1.20 0.00 41.60 0.00 0.20 0.00 14.29 0.00 0.00 0.00 0.00 34.67 3.33 0.40
dm-0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-2 0.00 0.40 0.00 13.60 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 34.00 5.00 0.20
dm-3 0.00 0.20 0.00 12.80 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 64.00 10.00 0.20
dm-4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-6 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-7 0.00 0.80 0.00 15.20 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 19.00 2.50 0.20
Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
nvme0n1 0.00 2.00 0.00 62.40 0.00 0.80 0.00 28.57 0.00 0.10 0.00 0.00 31.20 3.00 0.60
dm-0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-2 0.00 1.40 0.00 32.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 22.86 2.86 0.40
dm-3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-6 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-7 0.00 1.40 0.00 30.40 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 21.71 1.43 0.20
Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
nvme0n1 0.00 1.40 0.00 63.20 0.00 0.20 0.00 12.50 0.00 0.14 0.00 0.00 45.14 4.29 0.60
dm-0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-2 0.00 0.80 0.00 39.20 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 49.00 2.50 0.20
dm-3 0.00 0.20 0.00 12.80 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 64.00 10.00 0.20
dm-4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-6 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-7 0.00 0.60 0.00 11.20 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 18.67 3.33 0.20
#lscpu
Architecture: ppc64le
Byte Order: Little Endian
CPU(s): 48
On-line CPU(s) list: 0-47
Thread(s) per core: 8
Core(s) per socket: 6
Socket(s): 1
NUMA node(s): 1
Model: 2.0 (pvr 0080 0200)
Model name: POWER10 (architected), altivec supported
Hypervisor vendor: pHyp
Virtualization type: para
L1d cache: 32K
L1i cache: 48K
L2 cache: 1024K
L3 cache: 4096K
NUMA node0 CPU(s): 0-47
Physical sockets: 1
Physical chips: 4
Physical cores/chip: 6
# numactl --hardware
available: 1 nodes (0)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
node 0 size: 379675 MB
node 0 free: 278579 MB
node distances:
node 0
0: 10
# numastat
node0
numa_hit 26144191
numa_miss 0
numa_foreign 0
interleave_hit 5660366
local_node 26144191
other_node 0
How can I identify the bottleneck and how can I fix this?
cat /proc/interrupts output - https://pastebin.com/wjqrYVZm
HItesh
(1 rep)
Nov 3, 2023, 03:40 PM
• Last activity: Dec 25, 2023, 02:12 PM
0
votes
1
answers
688
views
How should Load Average be calculated on a CPU with Efficiency Cores?
I recently received a MacBook Pro with an M1 pro CPU, which has 2 "efficiency" cores and 8 performance cores. When I run htop/btop/top I get a load average of >2 because the process scheduler always assigns certain lower-demand processes to the efficiency cores, which results in those cores always r...
I recently received a MacBook Pro with an M1 pro CPU, which has 2 "efficiency" cores and 8 performance cores. When I run htop/btop/top I get a load average of >2 because the process scheduler always assigns certain lower-demand processes to the efficiency cores, which results in those cores always running at 60 to 100% capacity.
I feel like the 2 efficiency cores reduce the utility of the load average metric, which was already reduced due to multiple cores. Back in the dim, distant past we had single core CPUs that the load average made intuitive sense on. However now we have 2 types of CPU core in a single system, and my most recent phone has 3 different types of core: efficiency, performance, and a single ultra performance core.
How should such a new load average be calculated? Are there any ongoing efforts to redefine a general system-load metric?
Since efficiency cores are made to run low priority processes, perhaps excluding those from the default metric makes sense? Then divide the remaining load value among the non-efficiency CPUs.
For instance, a load average of 3.4. Subtract 2 for the efficiency cores, 1.4. Then divide by the number of performance cores, 1.4 / 8 = 0.175.
acjca2
(310 rep)
Nov 1, 2023, 03:09 PM
• Last activity: Nov 1, 2023, 08:07 PM
0
votes
2
answers
2377
views
Why is load average being reported as 0.00 though system is busy doing work?
`uptime`, `top`, `cat /proc/loadavg` all reporting load averages for the last 1/5/15 minutes as `0.00`, but the system is definitely busy doing work. Why is this? Server is running Red Hat Enterprise Linux Server release 6.6, kernel 2.6.32-504.12.2.el6.x86_64. $ uptime 12:13:44 up 73 days, 8 min, 1...
uptime
, top
, cat /proc/loadavg
all reporting load averages for the last 1/5/15 minutes as 0.00
, but the system is definitely busy doing work. Why is this? Server is running Red Hat Enterprise Linux Server release 6.6, kernel 2.6.32-504.12.2.el6.x86_64.
$ uptime
12:13:44 up 73 days, 8 min, 1 user, load average: 0.00, 0.00, 0.00
$ cat /proc/loadavg
0.00 0.00 0.00 12/2706 39700
$ top
top - 12:15:35 up 73 days, 10 min, 1 user, load average: 0.00, 0.00, 0.00
Tasks: 572 total, 4 running, 568 sleeping, 0 stopped, 0 zombie
Cpu(s): 37.1%us, 2.3%sy, 0.0%ni, 42.0%id, 18.0%wa, 0.0%hi, 0.5%si, 0.0%st
...
user117452
May 31, 2015, 12:18 AM
• Last activity: Jun 26, 2023, 01:04 PM
19
votes
4
answers
30594
views
Why/how does "uptime" show CPU load >1?
I have a **1 core CPU** installed on my PC. Sometimes, ``uptime`` shows load >1. How is this possible and what does this mean? EDIT: The values go up to ``2.4``
I have a **1 core CPU** installed on my PC. Sometimes, `
uptime
` shows load >1. How is this possible and what does this mean?
EDIT: The values go up to `2.4
`
Frantisek
(415 rep)
Mar 4, 2014, 08:20 PM
• Last activity: Mar 4, 2023, 05:32 PM
0
votes
1
answers
87
views
Where can I find the loadavg.c file in Ubuntu?
Ubuntu version: Linux version 5.15.0-52-generic (buildd@lcy02-amd64-032) (gcc (Ubuntu 11.2.0-19ubuntu1) 11.2.0, GNU ld (GNU Binutils for Ubuntu) 2.38) Other places say it can be found as fs/proc/loadavg.c but I don't have it.\ Where can I find the loadavg.c file?
Ubuntu version: Linux version 5.15.0-52-generic (buildd@lcy02-amd64-032) (gcc (Ubuntu 11.2.0-19ubuntu1) 11.2.0, GNU ld (GNU Binutils for Ubuntu) 2.38)
Other places say it can be found as fs/proc/loadavg.c but I don't have it.\
Where can I find the loadavg.c file?
john_smith
(3 rep)
Dec 26, 2022, 07:30 AM
• Last activity: Dec 26, 2022, 09:05 AM
4
votes
1
answers
859
views
Does high Load average cgroups give "wrong" overall load average
Assume you have a system with 2 processors. Now create a cgroup and configure this group to use only 1 processor. Populate it with enough processes to give it a load average of 5 (to prove a point); it is now hopelessly slow. I am assuming that the load average in `/proc/loadavg` will then also be 5...
Assume you have a system with 2 processors. Now create a cgroup and configure this group to use only 1 processor. Populate it with enough processes to give it a load average of 5 (to prove a point); it is now hopelessly slow.
I am assuming that the load average in
/proc/loadavg
will then also be 5, even though a different user is free to use the other CPU with no wait time.
Is this correct? Is there a source I could quote for this?
trevore
(149 rep)
Feb 2, 2016, 03:18 PM
• Last activity: Oct 17, 2022, 10:46 AM
4
votes
2
answers
2395
views
CPU and Load Average Conflict on EC2 server
I am having touble understanding what server resource is causing lag in my Java game server. In the last patch of my game server, I updated my EC2 lamp server from **apache2.2, php5.3, mysql5.5** to **apache2.4, php7.0, mysql5.6**. I also updated my game itself, to include many more instances of mon...
I am having touble understanding what server resource is causing lag in my Java game server. In the last patch of my game server, I updated my EC2 lamp server from **apache2.2, php5.3, mysql5.5** to **apache2.4, php7.0, mysql5.6**. I also updated my game itself, to include many more instances of monsters that are looped though every game loop - among other things.
Here is output from right when my game server starts up:
Here is output from a few minutes later:
And here is output from the next morning:
As you can see in the images the cpu usage of my Java process levels off around 80% in the last screenshot, yet load avg goes to 1.20. I have even seen it go as high as 2.7 this morning. The cpu credits affect how much actual cpu juice my server has so it makes sense that the percentage goes up as my credits balance diminishes, but why at 80% does my server lag?
On my Amazon EC2 metrics I see cpu at 10% (which confuses me even more):
Right when I start up my server my mmorpg does not lag at all. Then as soon as my cpu credits are depleted it starts to lag. This makes me feel like it is cpu based, but when I see 10% and 80% I don't see why. Any help would be greatly appreciated. I am on a T2.micro instance, so it has 1 vCPU. If I go up to the next instance it nearly doubles in price, and stays at same vCPU of 1, but with more credits.
Long story short, I want to understand fully what I going on as the 80% number is throwing me. I don't just want to throw money at the problem.




KisnardOnline
(143 rep)
Feb 25, 2017, 03:46 PM
• Last activity: Aug 9, 2022, 06:10 AM
1
votes
1
answers
1001
views
Understanding load average on multicore system with a multithreaded app
We have a curious situation regarding load average on our system. It's running an application called ZAG that is idle most of the day. But every 80 minutes or so, it has some sort of activity burst of 5-15 minutes' length. At the time of the burst, load average can climb to 60, 70, 80, 100 or more....
We have a curious situation regarding load average on our system. It's running an application called ZAG that is idle most of the day. But every 80 minutes or so, it has some sort of activity burst of 5-15 minutes' length. At the time of the burst, load average can climb to 60, 70, 80, 100 or more. What's interesting is this: during these high bursts, we'll see the CPU utilization percentages in htop show only 10-20% utilization per CPU. Furthermore, a script that I have written shows light CPU usage; during idle time this:
ps -eTo psr,user,pid,tid,cputime,class,rtprio,ni,pri,pcpu,stat,wchan:14,args | grep ZAG | awk '{sum += $10} END{print sum;}'
returns perhaps 535.0... that is, adding up all the CPU percentages from my ZAG application return 535.0% of a CPU, or 5.35/32 or 16.7% utilization of all the CPUs on my system. So in short, not a single CPU is being driven anywhere close to 100% which we expect during mostly idle times.
During the incident, the result comes out to about 538.0 percent... just a little bump higher. I also see more threads on the run queue, as shown by
while true; do ps -eTo psr,user,pid,tid,cputime,class,rtprio,ni,pri,pcpu,stat,wchan:14,args | grep ZAG | grep ' Rl' | wc -l; sleep 0.5; done
So CPU utilization goes up, little, and there are more threads running. But not using much more CPU, it seems, even as the load average shoots up. There is consistently very little going on regarding disk i/o or network i/o; sar data shows no discernable increase during this burst. There is no increase in memory utilization, and the number of processes may increase by a couple out of some 1700 total processes on the system, but that's all. There is nothing in cron that takes place at these times. htop or top output shows that there is some certainly CPU utilization taking place at this time, mostly user CPU (less than 5% system CPU reported by top). So it doesn't look like anything is waiting on data.
I don't notice anything extraordinary in /proc/interrupts. Rescheduling interrupts seem to be heavy there, but I spot checked half a dozen cores, both even and odd NUMA nodes, and they appear to be stead at about 1400 per second per processor.
This is a 16-core machine with hyperthreads turned on (E5-2667 v4 processor). It has 36 ZAG processes and 729 ZAG therads, as show by ps -ef and ps -eTf, respectively.
So this makes me wonder: How come my CPU utilization percentages are so low, while my load average is so high? Is it because out of my 36 ZAG processes, I have over 700 threads, and that perhaps a thread that's in
sched_yield()
is still in the run queue, but not accumulating CPU? Or is sched_yield()
no longer runnable, but in an noninterruptible state (see below)?
According to Brendan Gregg at https://www.brendangregg.com/blog/2017-08-08/linux-load-averages.html , "When load averages first appeared in Linux, they reflected CPU demand, as with other operating systems. But later on Linux changed them to include not only runnable tasks, but also tasks in the uninterruptible state (TASK_UNINTERRUPTIBLE or nr_uninterruptible). This state is used by code paths that want to avoid interruptions by signals, which includes tasks blocked on disk I/O and some locks. ... But don't Linux load averages sometimes go too high, more than can be explained by disk I/O? Yes, although my guess is that this is due to a new code path using TASK_UNINTERRUPTIBLE that didn't exist in 1993. In Linux 0.99.14, there were 13 codepaths that directly set TASK_UNINTERRUPTIBLE or TASK_SWAPPING (the swapping state was later removed from Linux). Nowadays, in Linux 4.12, there are nearly 400 codepaths that set TASK_UNINTERRUPTIBLE, including some lock primitives. It's possible that one of these codepaths should not be included in the load averages...."
Mike S
(2732 rep)
Jan 13, 2022, 09:58 PM
• Last activity: Jan 14, 2022, 11:09 PM
0
votes
1
answers
1340
views
CPU LOAD AVRG + how to deal process with D state
we can see from our `RHEL 7.6` server ( kernel version - `3.10.0-957.el7.x86_64` ) that following process are with `D` state ( they runs from `HDFS` user ) Note - *D state code means that process is in uninterruptible sleep* ps -eo s,user,cmd | grep ^[RD] D hdfs du -sk /grid/sdj/hadoop/hdfs/data/cur...
we can see from our
RHEL 7.6
server ( kernel version - 3.10.0-957.el7.x86_64
) that following process are with D
state ( they runs from HDFS
user )
Note - *D state code means that process is in uninterruptible sleep*
ps -eo s,user,cmd | grep ^[RD]
D hdfs du -sk /grid/sdj/hadoop/hdfs/data/current/BP-1018134753-10.3.6.170-1530088122990
D hdfs du -sk /grid/sdm/hadoop/hdfs/data/current/BP-1018134753-10.3.6.170-1530088122990
R root ps -eo s,user,cmd
note's - the disks sdj
and sdm
are 3T
byte size , also "du -sk
" happens on other disks as sdd , sdf
etc
and the disks are with ext4 file-system
we are suspect that the fact that we have high CPU load avrg is because the "du -sk" that actually run on the disks
so I was thinking what we can do regarding to below behavior
one option is maybe to disable the "du -sk
" verification from HDFS
, but no clue how to do that
second option is to think what actually cause the D
state ?
I don't sure ... but maybe upgrade the kernel version will help to avoid D state? or else? ( like disable the CPU Thread(s) ) , etc ?
more details
lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 48
On-line CPU(s) list: 0-47
Thread(s) per core: 2
Core(s) per socket: 12
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
and CPU LOAD AVRG is around ~ 42-45 ( for 15min avrg )
Reference :
https://community.cloudera.com/t5/Support-Questions/Does-hadoop-run-dfs-du-automatically-when-a-new-job-starts/td-p/231297
https://community.cloudera.com/t5/Support-Questions/Can-hdfs-dfsadmin-and-hdfs-dsfs-du-be-taxing-on-my-cluster/m-p/182402
https://community.pivotal.io/s/article/Dealing-with-Processes-in-State-D---Uninterruptible-Sleep-Usually-IO?language=en_US
https://www.golinuxhub.com/2018/05/how-to-disable-or-enable-hyper/
yael
(13936 rep)
Nov 28, 2021, 02:16 PM
• Last activity: Nov 28, 2021, 02:43 PM
1
votes
0
answers
1098
views
why docker container consume huge CPU load average
I want to discuss about strange behavior on our rhel 7.6 server We installed Kafka exporter with container on the server – kafka-01 machine ( total CPU on machine are 12 ) The following `yml` file described the Kafka exporter container configuration more docker.kafka-exporter.yml --- version: '2.4'...
I want to discuss about strange behavior on our rhel 7.6 server
We installed Kafka exporter with container on the server – kafka-01 machine ( total CPU on machine are 12 )
The following
yml
file described the Kafka exporter container configuration
more docker.kafka-exporter.yml
---
version: '2.4'
services:
kafka-exporter:
mem_limit: "612m"
image: kafka-exporter:v1.2.0
restart: always
network_mode: host
container_name: kafka-exporter
command: ["--kafka.server=kafka01.sys65.com:6667"]ump
ports:
- 9308:9308
logging:
driver: "json-file"
options:
max-size: "15m"
max-file: "1"
so when we start the container with dockr-compose
, as docker-compose -f docker.kafka-exporter.yml up -d
we notices that CPU load average jump from ~2-3
to 30-40
after 1-2 hours
and only restart of machine return the normal CPU load average to normal ( around 1 - 2
) , but again CPU jump each time we start the docker compose , ( even stopping the docker compose not decrease the CPU load average )
can someone gives some hint what could be the reason for that strange behavior ?
regarding our case is it useful to consider to install the https://github.com/draios/sysdig , for investigation ?
Notes:
We verified the CPU load avrg from uptime
linux command
*sometimes machine became **freeze** or **HANG** so we cant access the machine , and only reboot help to return machine again to normal*
yael
(13936 rep)
Nov 2, 2021, 06:50 AM
3
votes
5
answers
12929
views
Get per-core CPU load in shell script
I need to report the CPU load per core as a percentage from a shell script, but **I cannot run e.g. mpstat for one second**. Basically I think that the infos `top` is showing after pressing `1` is what I want, but I cannot configure top to show this in batch mode (at least I don't know how). I could...
I need to report the CPU load per core as a percentage from a shell script, but **I cannot run e.g. mpstat for one second**. Basically I think that the infos
top
is showing after pressing 1
is what I want, but I cannot configure top to show this in batch mode (at least I don't know how). I could create a ~/.toprc
file with the configuration, but then I have to hope that the users do not mess with it.
I looked at mpstat
and parse the output, but this supports only seconds as interval time. My script gets called via SNMP and waiting 1s for the response will generate a timeout, so this is not an option.
Are there other ways to get the per-core cpu load? I read about parsing /proc/stat
, but I think this is more a last resort.
Jens
(151 rep)
Sep 29, 2016, 11:56 AM
• Last activity: Sep 14, 2021, 10:49 AM
0
votes
1
answers
66
views
What is the best way to understand of the efficiency of my program other than load average?
Consider the following scenarios: 1. I have only one process running on the machine and from resources like top gives 100% CPU usage which is good. I'm efficiently using the CPU. 2. I have two processes each of them taking 50% CPU. I'm still using the CPU efficiently as the total is hitting 100%. 3....
Consider the following scenarios:
1. I have only one process running on the machine and from resources like top gives 100% CPU usage which is good. I'm efficiently using the CPU.
2. I have two processes each of them taking 50% CPU. I'm still using the CPU efficiently as the total is hitting 100%.
3. I have N (relatively large number) processes running on the machine. Since the CPU is busy. My process may not hit 100% CPU still makes sense as the processor is busy too.
4. Now let's say there is only one process on the machine and still the CPU usage doesn't hit 100% CPU. Assume the cause is due to a bad program (too much IO or the program is simply doing nothing).
How do I detect case 4? The load average is not a good metric because it takes the average at different times.
Is there any metric or method that I can use to quantify how efficiently my program is using the CPU both under no-load conditions and fully loaded conditions?
vijaychsk
(1 rep)
Aug 12, 2021, 09:51 PM
• Last activity: Aug 12, 2021, 10:17 PM
1
votes
2
answers
200
views
High program load, when killed program process, Linux doesn't go back to 0.5 normal load. Why?
I ran a program that reached CPU load of 39.99, obviously more than my 4 core CPU can handle but why when I killed the program, (which is killed), the CPU load doesn't drop to 0.50 when I didn't turn the program on? Also, I noticed that CPU load doesn't go down to 0.5 like instantly after a program...
I ran a program that reached CPU load of 39.99, obviously more than my 4 core CPU can handle but why when I killed the program, (which is killed), the CPU load doesn't drop to 0.50 when I didn't turn the program on?
Also, I noticed that CPU load doesn't go down to 0.5 like instantly after a program is killed, you need to wait for it to go down slowly. Why is that?
Okit Tfseven
(13 rep)
Feb 9, 2021, 01:31 AM
• Last activity: Feb 14, 2021, 04:08 PM
0
votes
3
answers
2033
views
Process with 1% CPU usage causing load average of 1.5
We recently observed a high load average of about 1.5 on our embedded system, even though pretty much all processes are supposedly sleeping (according to `htop`). The system in question is a dual-core Cortex-A9 running a realtime Linux kernel (4.14.126) built using buildroot. We are using initramfs...
We recently observed a high load average of about 1.5 on our embedded system, even though pretty much all processes are supposedly sleeping (according to
and this (every 100ms):
As you can see, there's a little bit of what appears to be socket-based IPC, and a select waiting for *something*.
In the IPC case, one thread appears to be mostly blocking in
htop
).
The system in question is a dual-core Cortex-A9 running a realtime Linux kernel (4.14.126) built using buildroot.
We are using initramfs for our root filesystem and there is no swap, so there is definitely **no disk I/O** during normal operation.
After a bit of digging, we found out that the load is caused by a program called [swupdate](https://sbabic.github.io/swupdate/swupdate.html) , which provides us with a convenient web interface for software updates (and we would very much like to continue using that).
When i use time
to estimate the average cpu-usage of that application (by calculating *(user+sys)/real*), i get a value of only about 1%, which doesn't make much sense considering the load average of 1.5.
I know that the load average also includes processes in the TASK_UNINTERRUPTIBLE
state, which don't contribute to cpu usage.
What i don't understand is why any of the threads/processes of that application would ever be in that state.
To further analyze the situation i have captured a kernel trace using [lttng](https://lttng.org/) , which shows that the only thing swupdate does is this (every 50ms):


nanosleep()
, while the other is blocking in accept()
, neither of which should consume any system resources as far as i'm aware.
FYI: the time base for both screenshots is the same, and the IPC takes approx. 500-600µs in total (which, considering the interval of 50ms, fits quite nicely with the observed 1% CPU usage)
So, what is causing the load here?
Felix G
(111 rep)
Aug 11, 2020, 10:59 AM
• Last activity: Feb 12, 2021, 08:15 PM
4
votes
1
answers
2638
views
Understand load average on multicore system
For only one microprocessor unit, the load average output by `top` could be understood that if it's above 1.0 then there are jobs waiting. But if we have n number of cores on a multicore system with `l*n` logical cores (on my Intel CPU n=6 and `l*n` = 12 so the output from `nproc` is 12), should we...
For only one microprocessor unit, the load average output by
top
could be understood that if it's above 1.0 then there are jobs waiting. But if we have n number of cores on a multicore system with l*n
logical cores (on my Intel CPU n=6 and l*n
= 12 so the output from nproc
is 12), should we divide the load average by the output from nproc
to see if that number is above 1 to understand if there are (on average) jobs waiting, or is it better to use htop
to understand if a parallel multicore system is getting too much average load?
I think that my method was wrong but the conclusion was right when I saw that an average load was above 10 top, I checked with ps
which process was expensive and found an overflow from a running program, but if that machine actually has output from nproc
> 10 then it would not really have been cause for investigation if I had known that. Do you agree?
Niklas Rosencrantz
(4324 rep)
Dec 15, 2020, 12:26 PM
• Last activity: Dec 15, 2020, 04:27 PM
Showing page 1 of 20 total questions