Sample Header Ad - 728x90

Unix & Linux Stack Exchange

Q&A for users of Linux, FreeBSD and other Unix-like operating systems

Latest Questions

4 votes
3 answers
3078 views
Check what process is spiking the average load with atop
In trying to find the culprit of a high load average on a system during night (which does not seem to be related to logrotate) I installed atop to write a raw file with a specific interval. While reading the file, it seems the processlist stands still, can I somehow go back and forth between the sam...
In trying to find the culprit of a high load average on a system during night (which does not seem to be related to logrotate) I installed atop to write a raw file with a specific interval. While reading the file, it seems the processlist stands still, can I somehow go back and forth between the samples to see what sticks out, and further sort by any column (like cpu usage)?
user135361 (193 rep)
Sep 3, 2013, 01:31 PM • Last activity: Jun 4, 2025, 01:06 AM
3 votes
1 answers
1997 views
Unusual high load average (due to peak I/O wait? irqs?)
I have a problem with high load average (`~2`) on my (personal laptop) computer for a long time now. I am running Arch Linux. If I remember correctly, the problem started with a certain kernel update, initially I thought it was related to [this bug][1]. The problem was not solved though, when the bu...
I have a problem with high load average (~2) on my (personal laptop) computer for a long time now. I am running Arch Linux. If I remember correctly, the problem started with a certain kernel update, initially I thought it was related to this bug . The problem was not solved though, when the bug was fixed. I did not really care as I thought it is still a bug, because the performance did not seem to suffer. What made me curious is that, recently, I had a moment of super low load average (~0) while idling. After a reboot, everything went back to "normal", with high load average. So I started investigating: % uptime 14:31:04 up 2:22, 1 user, load average: 1.96, 1.98, 1.99 So far nothing new. Then I tried top: % top -b -n 1 top - 14:33:52 up 2:25, 1 user, load average: 2.02, 2.07, 2.02 Tasks: 146 total, 2 running, 144 sleeping, 0 stopped, 0 zombie %Cpu0 : 2.6/0.9 3[|||| ] %Cpu1 : 2.7/0.9 4[|||| ] %Cpu2 : 2.7/1.0 4[|||| ] %Cpu3 : 2.7/0.8 3[|||| ] GiB Mem :228125107552256.0/7.712 [ GiB Swap: 0.0/7.904 [ ] PID USER PR NI VIRT RES %CPU %MEM TIME+ S COMMAND 2 root 20 0 0.0m 0.0m 0.0 0.0 0:00.00 S kthreadd 404 root 20 0 0.0m 0.0m 0.0 0.0 0:01.09 D `- rtsx_usb_ms_2 1854 root 20 0 0.0m 0.0m 0.0 0.0 0:06.03 D `- kworker/0:2 I cut out all the processes and kernel threads except those two. Here we can see already some suspicious kernel threads (state D). And some suspicious Mem value (see edit)... Looking at CPU: % mpstat Linux 4.13.12-1-ARCH (arch) 30.11.2017 _x86_64_ (4 CPU) 14:36:09 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle 14:36:09 all 2.66 0.00 0.88 1.56 0.00 0.01 0.00 0.00 0.00 94.90 % sar -u 1 30 Linux 4.13.12-1-ARCH (arch) 30.11.2017 _x86_64_ (4 CPU) 14:37:04 CPU %user %nice %system %iowait %steal %idle 14:37:05 all 1.00 0.00 0.75 0.00 0.00 98.25 14:37:06 all 1.76 0.00 0.50 0.00 0.00 97.74 14:37:07 all 1.00 0.00 0.25 0.00 0.00 98.75 14:37:08 all 0.50 0.00 0.50 0.00 0.00 99.00 14:37:09 all 0.50 0.00 0.50 0.25 0.00 98.75 14:37:10 all 0.50 0.00 0.50 6.03 0.00 92.96 14:37:11 all 0.75 0.00 0.50 11.75 0.00 87.00 14:37:12 all 0.50 0.00 0.25 0.00 0.00 99.25 [ . . . ] 14:37:21 all 1.26 0.00 0.76 0.00 0.00 97.98 14:37:22 all 0.75 0.00 0.25 2.26 0.00 96.73 14:37:23 all 0.50 0.00 0.50 16.83 0.00 82.16 14:37:24 all 0.75 0.00 0.50 0.00 0.00 98.74 14:37:25 all 0.50 0.00 0.50 0.00 0.00 98.99 14:37:26 all 0.76 0.00 0.50 7.56 0.00 91.18 14:37:27 all 0.25 0.00 0.51 0.00 0.00 99.24 14:37:28 all 1.00 0.00 0.75 0.25 0.00 98.00 14:37:29 all 0.25 0.00 0.76 0.00 0.00 98.99 14:37:30 all 0.75 0.00 0.50 0.00 0.00 98.74 14:37:31 all 0.75 0.00 0.50 3.27 0.00 95.48 14:37:32 all 0.51 0.00 0.51 13.16 0.00 85.82 14:37:33 all 0.75 0.00 0.50 0.25 0.00 98.49 14:37:34 all 1.26 0.00 0.75 0.00 0.00 97.99 Average: all 0.71 0.00 0.56 2.06 0.00 96.67 reveals some peaks in I/O wait. The best guess so far. Looking closer: % iostat -x 1 30 Linux 4.13.12-1-ARCH (arch) 30.11.2017 _x86_64_ (4 CPU) avg-cpu: %user %nice %system %iowait %steal %idle 2.60 0.00 0.87 1.55 0.00 94.98 Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sda 0.93 3.00 3.71 1.94 95.04 102.27 69.91 0.60 106.78 16.56 279.32 14.47 8.17 avg-cpu: %user %nice %system %iowait %steal %idle 0.75 0.00 0.75 0.25 0.00 98.25 Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sda 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.01 13.00 0.00 13.00 10.00 1.00 [ . . . ] avg-cpu: %user %nice %system %iowait %steal %idle 0.50 0.00 0.50 17.04 0.00 81.95 Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sda 0.00 8.00 0.00 2.00 0.00 40.00 40.00 0.69 346.50 0.00 346.50 346.50 69.30 [ . . . ] avg-cpu: %user %nice %system %iowait %steal %idle 0.25 0.00 0.50 7.29 0.00 91.96 [ . . . ] avg-cpu: %user %nice %system %iowait %steal %idle 1.00 0.00 0.75 16.96 0.00 81.30 Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sda 0.00 5.00 0.00 2.00 0.00 28.00 28.00 0.71 357.00 0.00 357.00 356.50 71.30 [ . . . ] avg-cpu: %user %nice %system %iowait %steal %idle 0.50 0.00 0.50 0.00 0.00 99.00 Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Looking at processes with uninterruptable sleep: % for x in seq 1 1 10; do ps -eo state,pid,cmd | grep "^D"; echo "----"; sleep 5; done D 404 [rtsx_usb_ms_2] D 1854 [kworker/0:2] D 2877 [kworker/0:0] ---- D 404 [rtsx_usb_ms_2] D 1854 [kworker/0:2] D 2877 [kworker/0:0] ---- D 404 [rtsx_usb_ms_2] D 1854 [kworker/0:2] D 2877 [kworker/0:0] ---- D 404 [rtsx_usb_ms_2] D 2877 [kworker/0:0] ---- D 404 [rtsx_usb_ms_2] ---- D 404 [rtsx_usb_ms_2] D 1854 [kworker/0:2] D 2877 [kworker/0:0] ---- D 404 [rtsx_usb_ms_2] D 2877 [kworker/0:0] ---- D 404 [rtsx_usb_ms_2] D 2877 [kworker/0:0] ---- D 404 [rtsx_usb_ms_2] D 1854 [kworker/0:2] D 2877 [kworker/0:0] ---- D 404 [rtsx_usb_ms_2] D 3177 [kworker/u32:4] ---- and last thing I did: % vmstat 1 procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu----- r b swpd free buff cache si so bi bo in cs us sy id wa st 0 1 0 5010040 123612 1220080 0 0 23 25 111 433 3 1 95 2 0 0 0 0 5006256 123612 1224164 0 0 0 96 186 839 1 1 97 1 0 1 0 0 5006132 123612 1224164 0 0 0 0 175 714 1 0 99 0 0 0 0 0 5003156 123612 1224156 0 0 0 0 234 1009 2 1 98 0 0 0 0 0 5003156 123612 1224156 0 0 0 0 161 680 0 0 99 0 0 0 1 0 5003156 123616 1224156 0 0 0 60 214 786 1 1 94 5 0 0 0 0 5003280 123620 1224156 0 0 0 4 226 776 1 0 88 11 0 1 0 0 5003156 123620 1224156 0 0 0 0 210 733 1 0 99 0 0 0 0 0 5005388 123620 1224156 0 0 0 0 159 747 1 0 99 0 0 0 0 0 5005388 123620 1224156 0 0 0 0 233 803 1 0 99 0 0 0 0 0 5005512 123620 1224156 0 0 0 0 152 670 1 0 99 0 0 0 0 0 5009664 123620 1220060 0 0 0 0 240 914 1 1 99 0 0 0 0 0 5009540 123620 1220060 0 0 0 0 237 833 1 1 99 0 0 0 0 0 5009664 123620 1220060 0 0 0 0 166 999 1 1 99 0 0 0 1 0 5009664 123620 1220060 0 0 0 4 168 700 1 0 88 11 0 0 0 0 5009540 123628 1220060 0 0 0 12 207 778 1 1 91 8 0 0 0 0 5009788 123628 1220064 0 0 0 0 189 717 0 1 99 0 0 0 0 0 5009664 123628 1220064 0 0 0 0 243 1453 1 1 98 0 0 0 0 0 5009044 123628 1220576 0 0 0 0 166 708 1 0 99 0 0 0 0 0 5009168 123628 1220576 0 0 0 0 146 663 1 0 99 0 0 0 0 0 5009540 123628 1220064 0 0 0 0 175 705 1 1 99 0 0 0 1 0 5009292 123632 1220128 0 0 0 8 223 908 1 0 99 0 0 ^C Now I still don't know what the problem is, but it looks like it comes from some peak I/O operations. There are some suspicious kernel threads. Any further ideas? What else could I do to investigate? **edit:** The Mem value seems strange, but it just occured very recently, a week ago or so, everything seemed to be normal. And % free total used free shared buff/cache available Mem: 8086240 1913860 4824764 133880 1347616 6231856 Swap: 8288252 0 8288252 seems to be fine though. **edit2:** First results of testing sar monitoring my system (very frequently, intervals of 1 second, but for a short duration, to get the peaks): Linux 4.13.12-1-ARCH (arch) 01.12.2017 _x86_64_ (4 CPU) 12:36:25 CPU %user %nice %system %iowait %steal %idle 12:36:26 all 0.50 0.00 0.50 0.00 0.00 99.00 12:36:27 all 0.50 0.00 0.50 0.25 0.00 98.74 12:36:28 all 0.50 0.00 0.75 0.00 0.00 98.75 12:36:29 all 0.50 0.00 0.25 7.52 0.00 91.73 12:36:30 all 0.25 0.00 0.75 9.77 0.00 89.22 12:36:31 all 0.25 0.00 0.75 0.00 0.00 98.99 12:36:32 all 1.00 0.00 0.50 0.25 0.00 98.25 12:36:33 all 1.00 0.00 1.00 0.00 0.00 98.00 12:36:34 all 0.25 0.00 0.25 0.25 0.00 99.24 12:36:35 all 0.50 0.25 0.75 33.25 0.00 65.25 12:36:36 all 0.50 0.00 0.75 0.25 0.00 98.50 12:36:37 all 0.75 0.00 0.25 0.00 0.00 99.00 12:36:38 all 0.25 0.00 0.50 0.00 0.00 99.24 12:36:39 all 0.50 0.00 0.50 0.00 0.00 99.00 12:36:40 all 0.50 0.25 0.50 10.75 0.00 88.00 Average: all 0.52 0.03 0.57 4.16 0.00 94.72 Network (-n) seems to be alright. Looking at devices (-d) reveals: Linux 4.13.12-1-ARCH (arch) 01.12.2017 _x86_64_ (4 CPU) 12:36:25 DEV tps rd_sec/s wr_sec/s avgrq-sz avgqu-sz await svctm %util 12:36:26 dev8-0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 12:36:26 dev8-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 [ . . . ] 12:36:29 dev8-7 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 12:36:30 dev8-0 2.00 0.00 88.00 44.00 0.41 355.00 207.00 41.40 12:36:30 dev8-1 2.00 0.00 88.00 44.00 0.41 355.00 207.00 41.40 12:36:30 dev8-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 12:36:30 dev8-3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 12:36:30 dev8-4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 12:36:30 dev8-5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 12:36:30 dev8-6 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 12:36:30 dev8-7 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 12:36:31 dev8-0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 12:36:31 dev8-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 [ . . . ] 12:36:34 dev8-7 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 12:36:35 dev8-0 2.00 0.00 24.00 12.00 0.70 348.50 348.00 69.60 12:36:35 dev8-1 2.00 0.00 24.00 12.00 0.70 348.50 348.00 69.60 12:36:35 dev8-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 12:36:35 dev8-3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 12:36:35 dev8-4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 12:36:35 dev8-5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 12:36:35 dev8-6 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 12:36:35 dev8-7 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 12:36:36 dev8-0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 12:36:36 dev8-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 [ . . . ] 12:36:40 dev8-7 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Average: dev8-0 0.27 0.00 7.47 28.00 0.12 351.75 455.75 12.15 Average: dev8-1 0.27 0.00 7.47 28.00 0.12 351.75 455.75 12.15 Average: dev8-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Average: dev8-3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Average: dev8-4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Average: dev8-5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Average: dev8-6 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Average: dev8-7 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 and -b gives: Linux 4.13.12-1-ARCH (arch) 01.12.2017 _x86_64_ (4 CPU) 12:36:25 tps rtps wtps bread/s bwrtn/s 12:36:26 0.00 0.00 0.00 0.00 0.00 12:36:27 0.00 0.00 0.00 0.00 0.00 12:36:28 0.00 0.00 0.00 0.00 0.00 12:36:29 0.00 0.00 0.00 0.00 0.00 12:36:30 2.00 0.00 2.00 0.00 88.00 12:36:31 0.00 0.00 0.00 0.00 0.00 12:36:32 0.00 0.00 0.00 0.00 0.00 12:36:33 0.00 0.00 0.00 0.00 0.00 12:36:34 0.00 0.00 0.00 0.00 0.00 12:36:35 2.00 0.00 2.00 0.00 24.00 12:36:36 0.00 0.00 0.00 0.00 0.00 12:36:37 0.00 0.00 0.00 0.00 0.00 12:36:38 0.00 0.00 0.00 0.00 0.00 12:36:39 0.00 0.00 0.00 0.00 0.00 12:36:40 0.00 0.00 0.00 0.00 0.00 Average: 0.27 0.00 0.27 0.00 7.47 So I assume the issue seems to be related to my hard drive (?). Because the I/O is on partition 1 (my root partition), it should be somewhere outside of /var which has an extra partition. The other partitions are data partitions and not system related. **edit3:** Even more data to that specific peak: paging looks fine (from my perspective with limited knowledge) Linux 4.13.12-1-ARCH (arch) 01.12.2017 _x86_64_ (4 CPU) 12:36:25 pgpgin/s pgpgout/s fault/s majflt/s pgfree/s pgscank/s pgscand/s pgsteal/s %vmeff 12:36:26 0.00 0.00 0.00 0.00 2233.00 0.00 0.00 0.00 0.00 12:36:27 0.00 0.00 0.00 0.00 88.00 0.00 0.00 0.00 0.00 12:36:28 0.00 0.00 766.00 0.00 185.00 0.00 0.00 0.00 0.00 12:36:29 0.00 40.00 0.00 0.00 47.00 0.00 0.00 0.00 0.00 12:36:30 0.00 4.00 0.00 0.00 45.00 0.00 0.00 0.00 0.00 12:36:31 0.00 0.00 1.00 0.00 46.00 0.00 0.00 0.00 0.00 12:36:32 0.00 0.00 5.00 0.00 560.00 0.00 0.00 0.00 0.00 12:36:33 0.00 0.00 2.00 0.00 85.00 0.00 0.00 0.00 0.00 12:36:34 0.00 0.00 2.00 0.00 47.00 0.00 0.00 0.00 0.00 12:36:35 0.00 12.00 0.00 0.00 44.00 0.00 0.00 0.00 0.00 12:36:36 0.00 0.00 0.00 0.00 47.00 0.00 0.00 0.00 0.00 12:36:37 0.00 0.00 2.00 0.00 45.00 0.00 0.00 0.00 0.00 12:36:38 0.00 0.00 0.00 0.00 47.00 0.00 0.00 0.00 0.00 12:36:39 0.00 0.00 0.00 0.00 77.00 0.00 0.00 0.00 0.00 12:36:40 0.00 8.00 0.00 0.00 47.00 0.00 0.00 0.00 0.00 Average: 0.00 4.27 51.87 0.00 242.87 0.00 0.00 0.00 0.00 It looks like files were created during that peak (-v): Linux 4.13.12-1-ARCH (arch) 01.12.2017 _x86_64_ (4 CPU) 12:36:25 dentunusd file-nr inode-nr pty-nr 12:36:26 186520 4480 195468 2 [ . . . ] 12:36:34 186520 4480 195468 2 12:36:35 186520 4512 195468 2 [ . . . ] 12:36:40 186520 4512 195468 2 Average: 186520 4493 195468 2 **edit4:** It looks like some irq's are responsible. Running iotop -o -a (show only processes with i/o and accumulate them, so keep all processes that had i/o since the start of the program) resulted in: Total DISK READ : 0.00 B/s | Total DISK WRITE : 0.00 B/s Actual DISK READ: 0.00 B/s | Actual DISK WRITE: 0.00 B/s TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND 7 be/4 root 0.00 B 0.00 B 0.00 % 99.99 % [ksoftirqd/0] 17 be/4 root 0.00 B 0.00 B 0.00 % 99.99 % [ksoftirqd/1] 23 be/4 root 0.00 B 0.00 B 0.00 % 99.99 % [ksoftirqd/2] 29 be/4 root 0.00 B 0.00 B 0.00 % 99.99 % [ksoftirqd/3] 292 rt/4 root 0.00 B 0.00 B 0.00 % 99.99 % [i915/signal:0] [ . . . ] So, is this a thing? How could I continue...?
nox (161 rep)
Nov 30, 2017, 02:48 PM • Last activity: May 15, 2025, 08:06 PM
5 votes
2 answers
18030 views
Difference between the %CPU usage and load average, and when should it be a concern?
I've searched multiple answers here but couldn't find an answer that is related to this scenario, but if you think you do, kindly point me to it. I am including the numbers here for the ease of my own comprehention. I have a 96 core baremetal linux server with 256 GB RAM that's dedicated to run an i...
I've searched multiple answers here but couldn't find an answer that is related to this scenario, but if you think you do, kindly point me to it. I am including the numbers here for the ease of my own comprehention. I have a 96 core baremetal linux server with 256 GB RAM that's dedicated to run an in-house written distributed event-based asynchronous network service that acts as a caching server. This daemon runs with 32 worker threads. Apart from the main task of fetching and caching, this server also does a variety of related tasks in a couple of extra separate threads like polling other members' health-checks, writing metrics to a unix socket, etc. The worker threads value isn't bumped further because increasing this will increase the cache lock contention. There is not much disk activity from this server, as the metrics are tried to be written in batches and if the unix socket fails, it just ignores it and frees the memory. This instance is a part of a 9 node cluster and the stats of this node speaks for the rest of the instances in this cluster. With the recent surge in inbound traffic, I see the %CPU usage of the process has gone up considerably but the load average is still less than 1. Please find the stats below.
:~$ nice top
top - 19:51:55 up 95 days,  7:27,  1 user,  load average: 0.33, 0.28, 0.32
PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
587486 cacher   20   0  107.4g  93.0g  76912 S  17.2  37.0   5038:13 cacher
The %CPU goes up till 80% some times but the load average is considerably less even then and doesn't go beyond 1.5, and this happens mostly when there's a cache miss and the cacher has to fetch it from an upstream, so it is mostly a set of network activities. As far as I understood, the compute heavy operation this service does at runtime is when it has to compute the hash of the item to be cached when it has to store it into the appropriate distributed buckets. There are no systemd limits set on this service for any parameters, and it is also tuned to disable the kernel's oomkiller for this process, although it is nowhere near the upper limit. The systemd sockets to which this binds has been tuned to accommodate more tx and rx buffer. * why is the load average less than 1 on a 96 core server when the %CPU for the service that uses 32 threads fluctuates between 20% and 80% consistently? * On a 96 core server, how much is considered a safe value for %CPU to go safely? Does it have a relation with how many threads it is used? If the number of threads are bumped, is a higher %CPU usage theoretically accepted? Thank you.
init (151 rep)
Jan 7, 2023, 08:23 PM • Last activity: Sep 19, 2024, 08:54 AM
-1 votes
1 answers
128 views
How to calculate load average is high or normal?
How to define/find some x.x value is high load by lokkin into uptime in linux server if i've 4 cpus? Is there any formula to calculate? ``` CPU(s): 4 On-line CPU(s) list: 0-3 Vendor ID: GenuineIntel Model name: Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz CPU family: 6 Model: 85 Thread(s) per core...
How to define/find some x.x value is high load by lokkin into uptime in linux server if i've 4 cpus? Is there any formula to calculate?
CPU(s):                   4
  On-line CPU(s) list:    0-3
Vendor ID:                GenuineIntel
  Model name:             Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
    CPU family:           6
    Model:                85
    Thread(s) per core:   2
    Core(s) per socket:   2
    Socket(s):            1
    Stepping:             7
$ uptime
 14:01:38 up 15 min,  2 users,  load average: 7.09, 3.44, 1.96
Current load average is high. But still server is working fine.
$ uptime
 14:39:50 up 53 min,  5 users,  load average: 31.95, 29.13, 24.25
Output of vmstat
$ vmstat 1 3
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
16  0      0 15798924   2168 189096    0    0   329    25  103  117 62  1 37  0  0
16  0      0 15798924   2168 189096    0    0     0     0  405  433 100  0  0  0  0
16  0      0 15798924   2168 189096    0    0     0     0  408  441 100  0  0  0  0
I've see few link. However i couldn't understand.
Priyanka (1 rep)
Aug 15, 2024, 02:43 PM • Last activity: Aug 15, 2024, 05:09 PM
-3 votes
4 answers
12521 views
What is the Unix command-line command to get the system load information?
In Linux or Unix operating system, I am getting the text `System load` as below. Can anyone please tell me what is the meaning for that and how to extract system load % using CLI commands? ``` System load: 6.84 ```
In Linux or Unix operating system, I am getting the text System load as below. Can anyone please tell me what is the meaning for that and how to extract system load % using CLI commands?
System load: 6.84
Dhans (29 rep)
Nov 21, 2019, 03:17 AM • Last activity: Mar 14, 2024, 03:50 AM
0 votes
1 answers
718 views
89% cpu is ideal but the load average is extremely high in rhel8.4
I am using RHEL 8.4 and I seem to always have a very high load average, despite my CPU being 89% idle: ``` $ uname -a Linux dx11866-hs 4.18.0-305.el8.ppc64le #1 SMP Thu Apr 29 08:53:15 EDT 2021 ppc64le ppc64le ppc64le GNU/Linux $top top - 19:32:45 up 150 days, 3:45, 1 user, load average: 3936.78, 39...
I am using RHEL 8.4 and I seem to always have a very high load average, despite my CPU being 89% idle: ``` $ uname -a Linux dx11866-hs 4.18.0-305.el8.ppc64le #1 SMP Thu Apr 29 08:53:15 EDT 2021 ppc64le ppc64le ppc64le GNU/Linux $top top - 19:32:45 up 150 days, 3:45, 1 user, load average: 3936.78, 3934.85, 3935.12 Tasks: 819 total, 1 running, 818 sleeping, 0 stopped, 0 zombie %Cpu(s): 10.6 us, 0.4 sy, 0.0 ni, 89.1 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st MiB Mem : 377629.6 total, 197139.6 free, 169755.4 used, 10734.7 buff/cache MiB Swap: 16383.9 total, 12444.2 free, 3939.8 used. 199111.0 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1271217 yarn 20 0 8059136 5.7g 20608 S 318.8 1.6 6716:49 java 999164 yarn 20 0 10.3g 3.4g 117376 S 162.5 0.9 2:43.75 java 997941 yarn 20 0 12.0g 2.1g 71040 S 43.8 0.6 3:28.04 java 10 root 20 0 0 0 0 I 6.2 0.0 90:45.27 rcu_sched 1000002 yarn 20 0 12.0g 761088 65344 S 6.2 0.2 0:12.84 java 1001197 yarn 20 0 12.0g 752704 65344 S 6.2 0.2 0:11.60 java 1001966 root 20 0 17600 8384 4992 R 6.2 0.0 0:00.02 top 3291901 yarn 20 0 7763072 1.6g 14912 S 6.2 0.4 3027:36 java 4002263 root 20 0 7263168 4.4g 16832 S 6.2 1.2 5859:55 java 1 root 20 0 181888 19136 10624 S 0.0 0.0 13:50.34 systemd 2 root 20 0 0 0 0 S 0.0 0.0 0:19.21 kthreadd 3 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 rcu_gp 4 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 rcu_par_gp 6 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 kworker/0:0H-events_highpri 8 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 mm_percpu_wq 9 root 20 0 0 0 0 S 0.0 0.0 3:40.28 ksoftirqd/0 11 root rt 0 0 0 0 S 0.0 0.0 0:11.21 migration/0 12 root rt 0 0 0 0 S 0.0 0.0 0:18.17 watchdog/0 13 root 20 0 0 0 0 S 0.0 0.0 0:00.00 cpuhp/0 14 root 20 0 0 0 0 S 0.0 0.0 0:00.00 cpuhp/1 15 root rt 0 0 0 0 S 0.0 0.0 0:19.25 watchdog/1 16 root rt 0 0 0 0 S 0.0 0.0 0:11.58 migration/1 17 root 20 0 0 0 0 S 0.0 0.0 3:26.51 ksoftirqd/1 19 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 kworker/1:0H-events_highpri 20 root 20 0 0 0 0 S 0.0 0.0 0:00.00 cpuhp/2 21 root rt 0 0 0 0 S 0.0 0.0 0:19.18 watchdog/2 22 root rt 0 0 0 0 S 0.0 0.0 0:04.86 migration/2 23 root 20 0 0 0 0 S 0.0 0.0 1:54.07 ksoftirqd/2 25 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 kworker/2:0H-events_highpri 26 root 20 0 0 0 0 S 0.0 0.0 0:00.00 cpuhp/3 27 root rt 0 0 0 0 S 0.0 0.0 0:18.64 watchdog/3 28 root rt 0 0 0 0 S 0.0 0.0 0:04.53 migration/3 # grep -c proc /proc/cpuinfo 48 iostat Linux 4.18.0-305.el8.ppc64le () 11/06/2023 _ppc64le_ (48 CPU) avg-cpu: %user %nice %system %iowait %steal %idle 12.61 0.00 0.64 0.05 0.00 86.70 Device tps kB_read/s kB_wrtn/s kB_read kB_wrtn nvme0n1 3.59 2.05 171.95 27091032 2268469720 dm-0 0.03 0.14 0.36 1840516 4710876 dm-1 0.03 0.58 1.33 7592176 17510144 dm-2 3.28 0.08 116.26 1036872 1533830064 dm-3 0.53 0.00 40.67 16352 536491196 dm-4 0.00 0.07 0.03 927276 458764 dm-5 0.00 0.00 0.00 18380 5276 dm-6 0.00 0.00 0.00 14660 2084 dm-7 0.32 0.32 13.30 4249592 175458336 iostat -d 5 -x Linux 4.18.0-305.el8.ppc64le () 11/07/2023 _ppc64le_ (48 CPU) Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util nvme0n1 3.77 7.89 513.41 5294.93 0.02 0.60 0.50 7.05 0.17 22.80 0.18 136.12 671.13 1.22 1.43 dm-0 1.61 0.19 125.55 3.16 0.00 0.00 0.00 0.00 0.11 0.29 0.00 77.75 16.84 1.03 0.19 dm-1 0.01 0.00 0.81 0.00 0.00 0.00 0.00 0.00 0.18 0.00 0.00 65.45 0.00 1.45 0.00 dm-2 0.72 2.95 91.00 295.43 0.00 0.00 0.00 0.00 0.21 0.29 0.00 126.23 100.27 1.67 0.61 dm-3 0.15 0.42 9.58 19.93 0.00 0.00 0.00 0.00 0.14 0.16 0.00 64.72 47.42 3.54 0.20 dm-4 0.40 0.04 47.73 1.11 0.00 0.00 0.00 0.00 0.14 0.16 0.00 119.67 26.24 1.43 0.06 dm-5 0.03 0.00 1.52 0.49 0.00 0.00 0.00 0.00 0.07 5.00 0.00 48.03 108.60 2.17 0.01 dm-6 0.07 0.00 126.99 0.47 0.00 0.00 0.00 0.00 0.63 0.00 0.00 1866.09 297.71 3.87 0.03 dm-7 0.52 1.13 97.19 4969.42 0.00 0.00 0.00 0.00 0.13 14.85 0.02 187.01 4403.33 2.99 0.49 Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util nvme0n1 0.20 0.40 1.60 13.60 0.00 0.00 0.00 0.00 0.00 0.00 0.00 8.00 34.00 6.67 0.40 dm-0 0.20 0.00 1.60 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 8.00 0.00 0.00 0.00 dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-2 0.00 0.20 0.00 0.80 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 4.00 10.00 0.20 dm-3 0.00 0.20 0.00 12.80 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 64.00 10.00 0.20 dm-4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-6 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-7 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util nvme0n1 0.00 9.40 0.00 319.20 0.00 2.60 0.00 21.67 0.00 0.09 0.00 0.00 33.96 1.06 1.00 dm-0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-2 0.00 6.60 0.00 229.60 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 34.79 0.91 0.60 dm-3 0.00 4.00 0.00 75.20 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 18.80 1.00 0.40 dm-4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-6 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-7 0.00 1.40 0.00 14.40 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 10.29 2.86 0.40 Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util nvme0n1 0.00 2.20 0.00 84.80 0.00 0.20 0.00 8.33 0.00 0.09 0.00 0.00 38.55 2.73 0.60 dm-0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-2 0.00 2.20 0.00 72.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 32.73 1.82 0.40 dm-3 0.00 0.20 0.00 12.80 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 64.00 10.00 0.20 dm-4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-6 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-7 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util nvme0n1 0.00 0.40 0.00 1.60 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 4.00 10.00 0.40 dm-0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-2 0.00 0.40 0.00 1.60 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 4.00 10.00 0.40 dm-3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-6 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-7 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util nvme0n1 0.00 1.00 0.00 40.00 0.00 0.40 0.00 28.57 0.00 0.00 0.00 0.00 40.00 6.00 0.60 dm-0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-2 0.00 0.40 0.00 13.60 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 34.00 5.00 0.20 dm-3 0.00 0.20 0.00 12.80 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 64.00 10.00 0.20 dm-4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-6 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-7 0.00 0.80 0.00 13.60 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 17.00 2.50 0.20 Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util nvme0n1 0.00 1.40 0.00 58.40 0.00 0.00 0.00 0.00 0.00 0.14 0.00 0.00 41.71 4.29 0.60 dm-0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-2 0.00 1.00 0.00 32.80 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 32.80 4.00 0.40 dm-3 0.00 0.40 0.00 25.60 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 64.00 5.00 0.20 dm-4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-6 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-7 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util nvme0n1 0.00 0.80 0.00 27.20 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 34.00 7.50 0.60 dm-0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-2 0.00 0.20 0.00 0.80 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 4.00 10.00 0.20 dm-3 0.00 0.20 0.00 12.80 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 64.00 10.00 0.20 dm-4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-6 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-7 0.00 0.40 0.00 13.60 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 34.00 5.00 0.20 Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util nvme0n1 0.00 9.00 0.00 300.00 0.00 2.00 0.00 18.18 0.00 0.09 0.00 0.00 33.33 1.11 1.00 dm-0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-2 0.00 9.00 0.00 264.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 29.33 0.67 0.60 dm-3 0.00 1.20 0.00 30.40 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 25.33 3.33 0.40 dm-4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-6 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-7 0.00 0.80 0.00 5.60 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 7.00 2.50 0.20 Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util nvme0n1 0.00 1.20 0.00 41.60 0.00 0.20 0.00 14.29 0.00 0.00 0.00 0.00 34.67 3.33 0.40 dm-0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-2 0.00 0.40 0.00 13.60 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 34.00 5.00 0.20 dm-3 0.00 0.20 0.00 12.80 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 64.00 10.00 0.20 dm-4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-6 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-7 0.00 0.80 0.00 15.20 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 19.00 2.50 0.20 Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util nvme0n1 0.00 2.00 0.00 62.40 0.00 0.80 0.00 28.57 0.00 0.10 0.00 0.00 31.20 3.00 0.60 dm-0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-2 0.00 1.40 0.00 32.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 22.86 2.86 0.40 dm-3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-6 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-7 0.00 1.40 0.00 30.40 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 21.71 1.43 0.20 Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util nvme0n1 0.00 1.40 0.00 63.20 0.00 0.20 0.00 12.50 0.00 0.14 0.00 0.00 45.14 4.29 0.60 dm-0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-2 0.00 0.80 0.00 39.20 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 49.00 2.50 0.20 dm-3 0.00 0.20 0.00 12.80 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 64.00 10.00 0.20 dm-4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-6 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-7 0.00 0.60 0.00 11.20 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 18.67 3.33 0.20 #lscpu Architecture: ppc64le Byte Order: Little Endian CPU(s): 48 On-line CPU(s) list: 0-47 Thread(s) per core: 8 Core(s) per socket: 6 Socket(s): 1 NUMA node(s): 1 Model: 2.0 (pvr 0080 0200) Model name: POWER10 (architected), altivec supported Hypervisor vendor: pHyp Virtualization type: para L1d cache: 32K L1i cache: 48K L2 cache: 1024K L3 cache: 4096K NUMA node0 CPU(s): 0-47 Physical sockets: 1 Physical chips: 4 Physical cores/chip: 6 # numactl --hardware available: 1 nodes (0) node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 node 0 size: 379675 MB node 0 free: 278579 MB node distances: node 0 0: 10 # numastat node0 numa_hit 26144191 numa_miss 0 numa_foreign 0 interleave_hit 5660366 local_node 26144191 other_node 0 How can I identify the bottleneck and how can I fix this? cat /proc/interrupts output - https://pastebin.com/wjqrYVZm
HItesh (1 rep)
Nov 3, 2023, 03:40 PM • Last activity: Dec 25, 2023, 02:12 PM
0 votes
1 answers
688 views
How should Load Average be calculated on a CPU with Efficiency Cores?
I recently received a MacBook Pro with an M1 pro CPU, which has 2 "efficiency" cores and 8 performance cores. When I run htop/btop/top I get a load average of >2 because the process scheduler always assigns certain lower-demand processes to the efficiency cores, which results in those cores always r...
I recently received a MacBook Pro with an M1 pro CPU, which has 2 "efficiency" cores and 8 performance cores. When I run htop/btop/top I get a load average of >2 because the process scheduler always assigns certain lower-demand processes to the efficiency cores, which results in those cores always running at 60 to 100% capacity. I feel like the 2 efficiency cores reduce the utility of the load average metric, which was already reduced due to multiple cores. Back in the dim, distant past we had single core CPUs that the load average made intuitive sense on. However now we have 2 types of CPU core in a single system, and my most recent phone has 3 different types of core: efficiency, performance, and a single ultra performance core. How should such a new load average be calculated? Are there any ongoing efforts to redefine a general system-load metric? Since efficiency cores are made to run low priority processes, perhaps excluding those from the default metric makes sense? Then divide the remaining load value among the non-efficiency CPUs. For instance, a load average of 3.4. Subtract 2 for the efficiency cores, 1.4. Then divide by the number of performance cores, 1.4 / 8 = 0.175.
acjca2 (310 rep)
Nov 1, 2023, 03:09 PM • Last activity: Nov 1, 2023, 08:07 PM
0 votes
2 answers
2377 views
Why is load average being reported as 0.00 though system is busy doing work?
`uptime`, `top`, `cat /proc/loadavg` all reporting load averages for the last 1/5/15 minutes as `0.00`, but the system is definitely busy doing work. Why is this? Server is running Red Hat Enterprise Linux Server release 6.6, kernel 2.6.32-504.12.2.el6.x86_64. $ uptime 12:13:44 up 73 days, 8 min, 1...
uptime, top, cat /proc/loadavg all reporting load averages for the last 1/5/15 minutes as 0.00, but the system is definitely busy doing work. Why is this? Server is running Red Hat Enterprise Linux Server release 6.6, kernel 2.6.32-504.12.2.el6.x86_64. $ uptime 12:13:44 up 73 days, 8 min, 1 user, load average: 0.00, 0.00, 0.00 $ cat /proc/loadavg 0.00 0.00 0.00 12/2706 39700 $ top top - 12:15:35 up 73 days, 10 min, 1 user, load average: 0.00, 0.00, 0.00 Tasks: 572 total, 4 running, 568 sleeping, 0 stopped, 0 zombie Cpu(s): 37.1%us, 2.3%sy, 0.0%ni, 42.0%id, 18.0%wa, 0.0%hi, 0.5%si, 0.0%st ...
user117452
May 31, 2015, 12:18 AM • Last activity: Jun 26, 2023, 01:04 PM
19 votes
4 answers
30594 views
Why/how does "uptime" show CPU load >1?
I have a **1 core CPU** installed on my PC. Sometimes, ``uptime`` shows load >1. How is this possible and what does this mean? EDIT: The values go up to ``2.4``
I have a **1 core CPU** installed on my PC. Sometimes, `uptime` shows load >1. How is this possible and what does this mean? EDIT: The values go up to `2.4`
Frantisek (415 rep)
Mar 4, 2014, 08:20 PM • Last activity: Mar 4, 2023, 05:32 PM
0 votes
1 answers
87 views
Where can I find the loadavg.c file in Ubuntu?
Ubuntu version: Linux version 5.15.0-52-generic (buildd@lcy02-amd64-032) (gcc (Ubuntu 11.2.0-19ubuntu1) 11.2.0, GNU ld (GNU Binutils for Ubuntu) 2.38) Other places say it can be found as fs/proc/loadavg.c but I don't have it.\ Where can I find the loadavg.c file?
Ubuntu version: Linux version 5.15.0-52-generic (buildd@lcy02-amd64-032) (gcc (Ubuntu 11.2.0-19ubuntu1) 11.2.0, GNU ld (GNU Binutils for Ubuntu) 2.38) Other places say it can be found as fs/proc/loadavg.c but I don't have it.\ Where can I find the loadavg.c file?
john_smith (3 rep)
Dec 26, 2022, 07:30 AM • Last activity: Dec 26, 2022, 09:05 AM
4 votes
1 answers
859 views
Does high Load average cgroups give "wrong" overall load average
Assume you have a system with 2 processors. Now create a cgroup and configure this group to use only 1 processor. Populate it with enough processes to give it a load average of 5 (to prove a point); it is now hopelessly slow. I am assuming that the load average in `/proc/loadavg` will then also be 5...
Assume you have a system with 2 processors. Now create a cgroup and configure this group to use only 1 processor. Populate it with enough processes to give it a load average of 5 (to prove a point); it is now hopelessly slow. I am assuming that the load average in /proc/loadavg will then also be 5, even though a different user is free to use the other CPU with no wait time. Is this correct? Is there a source I could quote for this?
trevore (149 rep)
Feb 2, 2016, 03:18 PM • Last activity: Oct 17, 2022, 10:46 AM
4 votes
2 answers
2395 views
CPU and Load Average Conflict on EC2 server
I am having touble understanding what server resource is causing lag in my Java game server. In the last patch of my game server, I updated my EC2 lamp server from **apache2.2, php5.3, mysql5.5** to **apache2.4, php7.0, mysql5.6**. I also updated my game itself, to include many more instances of mon...
I am having touble understanding what server resource is causing lag in my Java game server. In the last patch of my game server, I updated my EC2 lamp server from **apache2.2, php5.3, mysql5.5** to **apache2.4, php7.0, mysql5.6**. I also updated my game itself, to include many more instances of monsters that are looped though every game loop - among other things. Here is output from right when my game server starts up: enter image description here Here is output from a few minutes later: enter image description here And here is output from the next morning: enter image description here As you can see in the images the cpu usage of my Java process levels off around 80% in the last screenshot, yet load avg goes to 1.20. I have even seen it go as high as 2.7 this morning. The cpu credits affect how much actual cpu juice my server has so it makes sense that the percentage goes up as my credits balance diminishes, but why at 80% does my server lag? On my Amazon EC2 metrics I see cpu at 10% (which confuses me even more): enter image description here Right when I start up my server my mmorpg does not lag at all. Then as soon as my cpu credits are depleted it starts to lag. This makes me feel like it is cpu based, but when I see 10% and 80% I don't see why. Any help would be greatly appreciated. I am on a T2.micro instance, so it has 1 vCPU. If I go up to the next instance it nearly doubles in price, and stays at same vCPU of 1, but with more credits. Long story short, I want to understand fully what I going on as the 80% number is throwing me. I don't just want to throw money at the problem.
KisnardOnline (143 rep)
Feb 25, 2017, 03:46 PM • Last activity: Aug 9, 2022, 06:10 AM
1 votes
1 answers
1001 views
Understanding load average on multicore system with a multithreaded app
We have a curious situation regarding load average on our system. It's running an application called ZAG that is idle most of the day. But every 80 minutes or so, it has some sort of activity burst of 5-15 minutes' length. At the time of the burst, load average can climb to 60, 70, 80, 100 or more....
We have a curious situation regarding load average on our system. It's running an application called ZAG that is idle most of the day. But every 80 minutes or so, it has some sort of activity burst of 5-15 minutes' length. At the time of the burst, load average can climb to 60, 70, 80, 100 or more. What's interesting is this: during these high bursts, we'll see the CPU utilization percentages in htop show only 10-20% utilization per CPU. Furthermore, a script that I have written shows light CPU usage; during idle time this: ps -eTo psr,user,pid,tid,cputime,class,rtprio,ni,pri,pcpu,stat,wchan:14,args | grep ZAG | awk '{sum += $10} END{print sum;}' returns perhaps 535.0... that is, adding up all the CPU percentages from my ZAG application return 535.0% of a CPU, or 5.35/32 or 16.7% utilization of all the CPUs on my system. So in short, not a single CPU is being driven anywhere close to 100% which we expect during mostly idle times. During the incident, the result comes out to about 538.0 percent... just a little bump higher. I also see more threads on the run queue, as shown by while true; do ps -eTo psr,user,pid,tid,cputime,class,rtprio,ni,pri,pcpu,stat,wchan:14,args | grep ZAG | grep ' Rl' | wc -l; sleep 0.5; done So CPU utilization goes up, little, and there are more threads running. But not using much more CPU, it seems, even as the load average shoots up. There is consistently very little going on regarding disk i/o or network i/o; sar data shows no discernable increase during this burst. There is no increase in memory utilization, and the number of processes may increase by a couple out of some 1700 total processes on the system, but that's all. There is nothing in cron that takes place at these times. htop or top output shows that there is some certainly CPU utilization taking place at this time, mostly user CPU (less than 5% system CPU reported by top). So it doesn't look like anything is waiting on data. I don't notice anything extraordinary in /proc/interrupts. Rescheduling interrupts seem to be heavy there, but I spot checked half a dozen cores, both even and odd NUMA nodes, and they appear to be stead at about 1400 per second per processor. This is a 16-core machine with hyperthreads turned on (E5-2667 v4 processor). It has 36 ZAG processes and 729 ZAG therads, as show by ps -ef and ps -eTf, respectively. So this makes me wonder: How come my CPU utilization percentages are so low, while my load average is so high? Is it because out of my 36 ZAG processes, I have over 700 threads, and that perhaps a thread that's in sched_yield() is still in the run queue, but not accumulating CPU? Or is sched_yield() no longer runnable, but in an noninterruptible state (see below)? According to Brendan Gregg at https://www.brendangregg.com/blog/2017-08-08/linux-load-averages.html , "When load averages first appeared in Linux, they reflected CPU demand, as with other operating systems. But later on Linux changed them to include not only runnable tasks, but also tasks in the uninterruptible state (TASK_UNINTERRUPTIBLE or nr_uninterruptible). This state is used by code paths that want to avoid interruptions by signals, which includes tasks blocked on disk I/O and some locks. ... But don't Linux load averages sometimes go too high, more than can be explained by disk I/O? Yes, although my guess is that this is due to a new code path using TASK_UNINTERRUPTIBLE that didn't exist in 1993. In Linux 0.99.14, there were 13 codepaths that directly set TASK_UNINTERRUPTIBLE or TASK_SWAPPING (the swapping state was later removed from Linux). Nowadays, in Linux 4.12, there are nearly 400 codepaths that set TASK_UNINTERRUPTIBLE, including some lock primitives. It's possible that one of these codepaths should not be included in the load averages...."
Mike S (2732 rep)
Jan 13, 2022, 09:58 PM • Last activity: Jan 14, 2022, 11:09 PM
0 votes
1 answers
1340 views
CPU LOAD AVRG + how to deal process with D state
we can see from our `RHEL 7.6` server ( kernel version - `3.10.0-957.el7.x86_64` ) that following process are with `D` state ( they runs from `HDFS` user ) Note - *D state code means that process is in uninterruptible sleep* ps -eo s,user,cmd | grep ^[RD] D hdfs du -sk /grid/sdj/hadoop/hdfs/data/cur...
we can see from our RHEL 7.6 server ( kernel version - 3.10.0-957.el7.x86_64 ) that following process are with D state ( they runs from HDFS user ) Note - *D state code means that process is in uninterruptible sleep* ps -eo s,user,cmd | grep ^[RD] D hdfs du -sk /grid/sdj/hadoop/hdfs/data/current/BP-1018134753-10.3.6.170-1530088122990 D hdfs du -sk /grid/sdm/hadoop/hdfs/data/current/BP-1018134753-10.3.6.170-1530088122990 R root ps -eo s,user,cmd note's - the disks sdj and sdm are 3T byte size , also "du -sk" happens on other disks as sdd , sdf etc and the disks are with ext4 file-system we are suspect that the fact that we have high CPU load avrg is because the "du -sk" that actually run on the disks so I was thinking what we can do regarding to below behavior one option is maybe to disable the "du -sk" verification from HDFS , but no clue how to do that second option is to think what actually cause the D state ? I don't sure ... but maybe upgrade the kernel version will help to avoid D state? or else? ( like disable the CPU Thread(s) ) , etc ? more details lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 48 On-line CPU(s) list: 0-47 Thread(s) per core: 2 Core(s) per socket: 12 Socket(s): 2 NUMA node(s): 2 Vendor ID: GenuineIntel CPU family: 6 and CPU LOAD AVRG is around ~ 42-45 ( for 15min avrg ) Reference : https://community.cloudera.com/t5/Support-Questions/Does-hadoop-run-dfs-du-automatically-when-a-new-job-starts/td-p/231297 https://community.cloudera.com/t5/Support-Questions/Can-hdfs-dfsadmin-and-hdfs-dsfs-du-be-taxing-on-my-cluster/m-p/182402 https://community.pivotal.io/s/article/Dealing-with-Processes-in-State-D---Uninterruptible-Sleep-Usually-IO?language=en_US https://www.golinuxhub.com/2018/05/how-to-disable-or-enable-hyper/
yael (13936 rep)
Nov 28, 2021, 02:16 PM • Last activity: Nov 28, 2021, 02:43 PM
1 votes
0 answers
1098 views
why docker container consume huge CPU load average
I want to discuss about strange behavior on our rhel 7.6 server We installed Kafka exporter with container on the server – kafka-01 machine ( total CPU on machine are 12 ) The following `yml` file described the Kafka exporter container configuration more docker.kafka-exporter.yml --- version: '2.4'...
I want to discuss about strange behavior on our rhel 7.6 server We installed Kafka exporter with container on the server – kafka-01 machine ( total CPU on machine are 12 ) The following yml file described the Kafka exporter container configuration more docker.kafka-exporter.yml --- version: '2.4' services: kafka-exporter: mem_limit: "612m" image: kafka-exporter:v1.2.0 restart: always network_mode: host container_name: kafka-exporter command: ["--kafka.server=kafka01.sys65.com:6667"]ump ports: - 9308:9308 logging: driver: "json-file" options: max-size: "15m" max-file: "1" so when we start the container with dockr-compose , as docker-compose -f docker.kafka-exporter.yml up -d we notices that CPU load average jump from ~2-3 to 30-40 after 1-2 hours and only restart of machine return the normal CPU load average to normal ( around 1 - 2 ) , but again CPU jump each time we start the docker compose , ( even stopping the docker compose not decrease the CPU load average ) can someone gives some hint what could be the reason for that strange behavior ? regarding our case is it useful to consider to install the https://github.com/draios/sysdig , for investigation ? Notes: We verified the CPU load avrg from uptime linux command *sometimes machine became **freeze** or **HANG** so we cant access the machine , and only reboot help to return machine again to normal*
yael (13936 rep)
Nov 2, 2021, 06:50 AM
3 votes
5 answers
12929 views
Get per-core CPU load in shell script
I need to report the CPU load per core as a percentage from a shell script, but **I cannot run e.g. mpstat for one second**. Basically I think that the infos `top` is showing after pressing `1` is what I want, but I cannot configure top to show this in batch mode (at least I don't know how). I could...
I need to report the CPU load per core as a percentage from a shell script, but **I cannot run e.g. mpstat for one second**. Basically I think that the infos top is showing after pressing 1 is what I want, but I cannot configure top to show this in batch mode (at least I don't know how). I could create a ~/.toprc file with the configuration, but then I have to hope that the users do not mess with it. I looked at mpstat and parse the output, but this supports only seconds as interval time. My script gets called via SNMP and waiting 1s for the response will generate a timeout, so this is not an option. Are there other ways to get the per-core cpu load? I read about parsing /proc/stat, but I think this is more a last resort.
Jens (151 rep)
Sep 29, 2016, 11:56 AM • Last activity: Sep 14, 2021, 10:49 AM
0 votes
1 answers
66 views
What is the best way to understand of the efficiency of my program other than load average?
Consider the following scenarios: 1. I have only one process running on the machine and from resources like top gives 100% CPU usage which is good. I'm efficiently using the CPU. 2. I have two processes each of them taking 50% CPU. I'm still using the CPU efficiently as the total is hitting 100%. 3....
Consider the following scenarios: 1. I have only one process running on the machine and from resources like top gives 100% CPU usage which is good. I'm efficiently using the CPU. 2. I have two processes each of them taking 50% CPU. I'm still using the CPU efficiently as the total is hitting 100%. 3. I have N (relatively large number) processes running on the machine. Since the CPU is busy. My process may not hit 100% CPU still makes sense as the processor is busy too. 4. Now let's say there is only one process on the machine and still the CPU usage doesn't hit 100% CPU. Assume the cause is due to a bad program (too much IO or the program is simply doing nothing). How do I detect case 4? The load average is not a good metric because it takes the average at different times. Is there any metric or method that I can use to quantify how efficiently my program is using the CPU both under no-load conditions and fully loaded conditions?
vijaychsk (1 rep)
Aug 12, 2021, 09:51 PM • Last activity: Aug 12, 2021, 10:17 PM
1 votes
2 answers
200 views
High program load, when killed program process, Linux doesn't go back to 0.5 normal load. Why?
I ran a program that reached CPU load of 39.99, obviously more than my 4 core CPU can handle but why when I killed the program, (which is killed), the CPU load doesn't drop to 0.50 when I didn't turn the program on? Also, I noticed that CPU load doesn't go down to 0.5 like instantly after a program...
I ran a program that reached CPU load of 39.99, obviously more than my 4 core CPU can handle but why when I killed the program, (which is killed), the CPU load doesn't drop to 0.50 when I didn't turn the program on? Also, I noticed that CPU load doesn't go down to 0.5 like instantly after a program is killed, you need to wait for it to go down slowly. Why is that?
Okit Tfseven (13 rep)
Feb 9, 2021, 01:31 AM • Last activity: Feb 14, 2021, 04:08 PM
0 votes
3 answers
2033 views
Process with 1% CPU usage causing load average of 1.5
We recently observed a high load average of about 1.5 on our embedded system, even though pretty much all processes are supposedly sleeping (according to `htop`). The system in question is a dual-core Cortex-A9 running a realtime Linux kernel (4.14.126) built using buildroot. We are using initramfs...
We recently observed a high load average of about 1.5 on our embedded system, even though pretty much all processes are supposedly sleeping (according to htop). The system in question is a dual-core Cortex-A9 running a realtime Linux kernel (4.14.126) built using buildroot. We are using initramfs for our root filesystem and there is no swap, so there is definitely **no disk I/O** during normal operation. After a bit of digging, we found out that the load is caused by a program called [swupdate](https://sbabic.github.io/swupdate/swupdate.html) , which provides us with a convenient web interface for software updates (and we would very much like to continue using that). When i use time to estimate the average cpu-usage of that application (by calculating *(user+sys)/real*), i get a value of only about 1%, which doesn't make much sense considering the load average of 1.5. I know that the load average also includes processes in the TASK_UNINTERRUPTIBLE state, which don't contribute to cpu usage. What i don't understand is why any of the threads/processes of that application would ever be in that state. To further analyze the situation i have captured a kernel trace using [lttng](https://lttng.org/) , which shows that the only thing swupdate does is this (every 50ms): enter image description here and this (every 100ms): enter image description here As you can see, there's a little bit of what appears to be socket-based IPC, and a select waiting for *something*. In the IPC case, one thread appears to be mostly blocking in nanosleep(), while the other is blocking in accept(), neither of which should consume any system resources as far as i'm aware. FYI: the time base for both screenshots is the same, and the IPC takes approx. 500-600µs in total (which, considering the interval of 50ms, fits quite nicely with the observed 1% CPU usage) So, what is causing the load here?
Felix G (111 rep)
Aug 11, 2020, 10:59 AM • Last activity: Feb 12, 2021, 08:15 PM
4 votes
1 answers
2638 views
Understand load average on multicore system
For only one microprocessor unit, the load average output by `top` could be understood that if it's above 1.0 then there are jobs waiting. But if we have n number of cores on a multicore system with `l*n` logical cores (on my Intel CPU n=6 and `l*n` = 12 so the output from `nproc` is 12), should we...
For only one microprocessor unit, the load average output by top could be understood that if it's above 1.0 then there are jobs waiting. But if we have n number of cores on a multicore system with l*n logical cores (on my Intel CPU n=6 and l*n = 12 so the output from nproc is 12), should we divide the load average by the output from nproc to see if that number is above 1 to understand if there are (on average) jobs waiting, or is it better to use htop to understand if a parallel multicore system is getting too much average load? I think that my method was wrong but the conclusion was right when I saw that an average load was above 10 top, I checked with ps which process was expensive and found an overflow from a running program, but if that machine actually has output from nproc > 10 then it would not really have been cause for investigation if I had known that. Do you agree?
Niklas Rosencrantz (4324 rep)
Dec 15, 2020, 12:26 PM • Last activity: Dec 15, 2020, 04:27 PM
Showing page 1 of 20 total questions