Unix & Linux Stack Exchange
Q&A for users of Linux, FreeBSD and other Unix-like operating systems
Latest Questions
7
votes
1
answers
4888
views
Very High CPU Usage By IRQ #16
I recently noticed that one of my CPUs was idling at around 85-90% and according to `top` the usage was coming from interrupts, so [just like in this question][1] I used a combination of `dmesg` and periodically `cat`ing `/proc/interrupts` and found out this: CPU0 CPU1 CPU2 CPU3 0: 17 0 0 0 IR-IO-AP...
I recently noticed that one of my CPUs was idling at around 85-90% and according to
top
the usage was coming from interrupts, so just like in this question I used a combination of dmesg
and periodically cat
ing /proc/interrupts
and found out this:
CPU0 CPU1 CPU2 CPU3
0: 17 0 0 0 IR-IO-APIC 2-edge timer
1: 11548 0 2429 0 IR-IO-APIC 1-edge i8042
8: 0 0 0 1 IR-IO-APIC 8-edge rtc0
9: 7 16 0 0 IR-IO-APIC 9-fasteoi acpi
12: 14530 108887 0 0 IR-IO-APIC 12-edge i8042
16: 78464100 0 0 11702812 IR-IO-APIC 16-fasteoi idma64.0, i2c_designware.0, i801_smbus
120: 0 0 0 0 DMAR-MSI 0-edge dmar0
121: 0 0 0 0 DMAR-MSI 1-edge dmar1
As you can see, IRQ #16 is sending interrupts like crazy (every time the CPU wakes up from S3 it seems to start spamming a different CPU), I also found out that my touchpad uses the same IRQ and if the I2C mode is enabled (or *advanced* mode, according to my BIOS), it randomly stops working with the following messages (from dmesg
):
[ 167.851139] irq 16: nobody cared (try booting with the "irqpoll" option)
[ 167.851158] CPU: 2 PID: 3874 Comm: firefox Not tainted 4.15.3-300.fc27.x86_64 #1
[ 167.851160] Hardware name: Acer Aspire E5-575/Ironman_SK , BIOS V1.04 04/26/2016
[ 167.851162] Call Trace:
[ 167.851171]
[ 167.851185] dump_stack+0x5c/0x85
[ 167.851193] __report_bad_irq+0x30/0xc0
[ 167.851196] note_interrupt+0x235/0x280
[ 167.851198] handle_irq_event_percpu+0x51/0x70
[ 167.851201] handle_irq_event+0x27/0x50
[ 167.851204] handle_fasteoi_irq+0x6b/0x120
[ 167.851209] handle_irq+0xaf/0x120
[ 167.851214] do_IRQ+0x41/0xc0
[ 167.851219] common_interrupt+0xa2/0xa2
[ 167.851222]
[ 167.851224] RIP: 0010:_raw_spin_lock+0x10/0x20
[ 167.851226] RSP: 0000:ffffa85a857dfdd0 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffdb
[ 167.851230] RAX: 0000000000000000 RBX: ffff8d0a268930c8 RCX: 00003ffffffff000
[ 167.851231] RDX: 0000000000000001 RSI: 8000000000000025 RDI: ffffd21648d7ca70
[ 167.851232] RBP: ffffd2164892e100 R08: 0000000000000000 R09: 0000000000171800
[ 167.851233] R10: 0000000000271800 R11: 0000000000001000 R12: 0000000000000000
[ 167.851234] R13: 8000000224b84867 R14: ffffd21648d7ca70 R15: ffff8d0a35f29810
[ 167.851244] __handle_mm_fault+0xa4c/0x1290
[ 167.851249] handle_mm_fault+0xaa/0x1f0
[ 167.851255] __do_page_fault+0x25d/0x4e0
[ 167.851262] ? SyS_mmap_pgoff+0xfb/0x250
[ 167.851264] do_page_fault+0x32/0x110
[ 167.851267] ? page_fault+0x36/0x60
[ 167.851269] page_fault+0x4c/0x60
[ 167.851272] RIP: 0033:0x7ff86dc0b205
[ 167.851273] RSP: 002b:00007ffe6493e888 EFLAGS: 00010206
[ 167.851276] handlers:
[ 167.851291] [] idma64_irq [idma64]
[ 167.851296] [] i2c_dw_isr
[ 167.851302] [] i801_isr [i2c_i801]
[ 167.851304] Disabling IRQ #16
Is this a hardware issue? What can I do?
----------
Finally I have a chance to dig more into this, by running lspci -nnkv
I found out 2 devices that are using IRQ 16:
00:15.0 Signal processing controller : Intel Corporation Sunrise Point-LP Serial IO I2C Controller #0 [8086:9d60] (rev 21)
Subsystem: Acer Incorporated [ALI] Device [1025:1094]
Flags: fast devsel, IRQ 16
Memory at a132b000 (64-bit, non-prefetchable) [size=4K]
Capabilities: Power Management version 3
Capabilities: Vendor Specific Information: Len=14
Kernel driver in use: intel-lpss
Kernel modules: intel_lpss_pci
and:
00:1f.4 SMBus [0c05]: Intel Corporation Sunrise Point-LP SMBus [8086:9d23] (rev 21)
Subsystem: Acer Incorporated [ALI] Device [1025:1094]
Flags: medium devsel, IRQ 16
Memory at a132e000 (64-bit, non-prefetchable) [size=256]
I/O ports at 4040 [size=32]
Kernel driver in use: i801_smbus
Kernel modules: i2c_i801
The problem seems to go away if I unload the intel_lpss_pci
module i.e. rmmod intel_lpss_pci
, but of course the touchpad would stop working. But I guess it's better than having a CPU always at 100%.
arielnmz
(559 rep)
Mar 1, 2018, 05:25 AM
• Last activity: Jun 26, 2025, 02:05 PM
3
votes
1
answers
1988
views
Remove the new kernel
I was working without any problem and one day the computer didn't start he said that there is a problem in the Xserver , so I connected to my computer through ssh and reinstalled xserver-xorg ,xserver-xorg-core then The computer start working but when I tap the touchpad it's don't click (it's not a...
I was working without any problem and one day the computer didn't start he said that there is a problem in the Xserver , so I connected to my computer through ssh and reinstalled xserver-xorg ,xserver-xorg-core then The computer start working but when I tap the touchpad it's don't click (it's not a configuration problem) so i copied the drivers from another linux mint and copied them into mine , and in that moment i found that i have the kernel 4.2.0-32 installed but it's not in my friend computer then I tried to uninstall it but it was impossible then I've installed the kernel 4.4.0-22 now the the touchpad is working but youtube in chrome show black video , I've changed something in the configuration about harware acceleration and now it's fine now the computer work without any problem except it's working so slowly I'm just using a browser and i'm using 60% from my CPU
**so I want to go back to the kernel 3.19.0-32 because it's recommended and the problem is that I can't uninstall the current kernel because it's loaded so i want to know how can i load another installed kernel to be able to delete the current one ? (Current kernel 4.4.0-22)**
*Here i can delete the old kernel because it's not loaded*
*Here i can't delete the new kernel because it's loaded (I want to delete it)*


Khalil Bz
(153 rep)
May 8, 2016, 11:00 AM
• Last activity: Jun 23, 2025, 04:03 PM
2
votes
1
answers
3240
views
How can I get the total CPU usage of a Linux machine with 1 or n CPU cores?
I am currently using the below method to extract CPU usage idle value from top command and subtracting the value from 100. Is this method correct and is there a better way to achieve the same. Also, my linux VM is a stripped down version and has only few basic tools like `top`. Installing other tool...
I am currently using the below method to extract CPU usage idle value from top command and subtracting the value from 100. Is this method correct and is there a better way to achieve the same.
Also, my linux VM is a stripped down version and has only few basic tools like
top
. Installing other tools is not an option as the package manager is also removed.
CPU_IDLE="$(top -bn2 | grep -F '%Cpu' | tail -n 4 | gawk '{print $8 $9}' | tr -s '\n\:\,[:alpha:]' ' '| gawk '{print $2}'),"
Bandi Sandeep
(21 rep)
Apr 12, 2017, 06:07 AM
• Last activity: May 28, 2025, 12:05 PM
0
votes
0
answers
111
views
MySQL Server keeps hitting 100% CPU
My server has 800+ days uptime, but ever since a couple weeks the **MySQL server** keeps hitting **100% CPU** and I have to restart the `mysqld` service. Where do I start to find out where this issue is coming from? As all the sites on my server keep freezing until I restart. I don't want to keep re...
My server has 800+ days uptime, but ever since a couple weeks the **MySQL server** keeps hitting **100% CPU** and I have to restart the
Debian 10 Buster
MySQL Version: 14.14
Average users: 150 - 200 per day
vCPU's: 4
RAM: 4GB
**my.cnf** No slow queries shown so far. [mysqld] collation_server = utf8mb4_unicode_ci character_set_server = utf8mb4 sql-mode="NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION" slow_query_log = 1 slow_query_log_file = /var/log/mysql/slow_queries.log log_queries_not_using_indexes = 'OFF' long_query_time = 5
mysqld
service.
Where do I start to find out where this issue is coming from? As all the sites on my server keep freezing until I restart. I don't want to keep restarting the service every time. Sometimes it is after a couple minutes, sometimes a couple hours, sometimes a couple days.
Nothing changed really. I only once updated the my.cnf
file for the default character set and restarted recently, but I don't believe this is linked in any way.
**--- Server details ---**Debian 10 Buster
MySQL Version: 14.14
Average users: 150 - 200 per day
vCPU's: 4
RAM: 4GB
**my.cnf** No slow queries shown so far. [mysqld] collation_server = utf8mb4_unicode_ci character_set_server = utf8mb4 sql-mode="NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION" slow_query_log = 1 slow_query_log_file = /var/log/mysql/slow_queries.log log_queries_not_using_indexes = 'OFF' long_query_time = 5
Z0q
(631 rep)
May 31, 2024, 03:56 PM
• Last activity: May 26, 2025, 05:19 AM
2
votes
1
answers
1940
views
Short periodic freezes every few seconds. Everything except the mouse
My computer freezes every 1 or 2 seconds for a short period as well. So 1 or 2 seconds working and 1 or 2 seconds not working. Everything stops working except for the mouse. ---------- The first time I discovered the problem was when I wanted to open a 1GB txt file with leafpad. The syslog (and othe...
My computer freezes every 1 or 2 seconds for a short period as well. So 1 or 2 seconds working and 1 or 2 seconds not working.
Everything stops working except for the mouse.
----------
The first time I discovered the problem was when I wanted to open a 1GB txt file with leafpad. The syslog (and other files) raised to 350MB with leafpad errors. I still don't really think it could be the cause but since then I have noticed it slow.
I tried deleting those lines to make the files lighter but didn't work (ofc).
The line was a repetition of:
localhost leafpad: pango_tab_array_get_tab: assertion 'tab_index >= 0' failed
----------
**Gnome-shell debugging** (In the end I think the problem is not there)
I have runned top to see the problem and my first guess was that gnome-shell.
I have disabled all extensions on gnome and I have put
Hidden=True
on the gnome tracker. Reboot ofc but issue still continues.
top - 11:37:47 up 16 min, 1 user, load average: 5.08, 4.53, 3.07
Tasks: 186 total, 1 running, 185 sleeping, 0 stopped, 0 zombie
%Cpu(s): 5.4 us, 13.6 sy, 0.0 ni, 78.8 id, 2.2 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 11894.0 total, 9255.9 free, 884.7 used, 1753.4 buff/cache
MiB Swap: 0.0 total, 0.0 free, 0.0 used. 10597.3 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1467 root 20 0 3828560 258004 73056 S 16.9 2.1 3:00.91 gnome-she+
1627 root 20 0 384600 23668 17328 S 13.0 0.2 1:31.00 gsd-xsett+
1732 root 20 0 1190848 66960 31648 S 11.6 0.5 1:21.92 gnome-sof+
2371 root 20 0 239576 28532 22080 S 9.0 0.2 0:49.61 leafpad
2282 root 20 0 1397692 79500 38488 S 8.3 0.7 2:27.84 nautilus
1618 root 20 0 452484 40448 13752 S 7.6 0.3 1:01.97 packageki+
1643 root 20 0 384156 24452 17428 S 5.3 0.2 1:16.62 gsd-keybo+
1636 root 20 0 236512 22152 17128 S 3.0 0.2 1:16.76 gsd-clipb+
1269 root 20 0 343084 47552 32060 S 0.7 0.4 0:19.31 Xorg
9 root 20 0 0 0 0 I 0.3 0.0 0:01.07 rcu_sched
1176 message+ 20 0 18272 5276 3476 S 0.3 0.0 0:01.51 dbus-daem+
1640 root 20 0 550896 24776 19364 S 0.3 0.2 1:18.79 gsd-color
2850 root 20 0 527664 39564 28252 S 0.3 0.3 0:07.43 gnome-ter+
3048 root 20 0 15804 3484 3040 R 0.3 0.0 0:00.01 top
1 root 20 0 192548 9036 6632 S 0.0 0.1 0:02.95 systemd
2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd
3 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 rcu_gp
I have used the following to know where the issue was and it seeams openat takes the majority. What also get's my attention is the amount of errors that function gets and I guess that might be the problem.
strace -c -p 1467
strace: Process 1467 attached
^Cstrace: Process 1467 detached
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
38.35 2.269925 65 34909 22415 openat
21.63 1.280485 1583 809 252 unlink
18.82 1.113966 4700 237 15 link
16.79 0.993957 4498 221 rename
0.96 0.056549 2 30633 21313 access
0.91 0.053897 3 20006 186 stat
0.47 0.027686 1 19059 read
0.42 0.024586 2 12498 close
0.33 0.019538 2 10852 fstat
0.28 0.016418 5 3083 munmap
0.21 0.012386 4 3099 mmap
0.18 0.010921 21 528 write
0.13 0.007561 1 7413 getuid
So I killed the gnome-shell process and still the problem remains. I don't really see what the problem may be and I have a 4 cores intel i7 processor pc so it shouldn't be so demanded.
----------
iostat -h
Linux 10/11/2018 _x86_64_ (4 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
24.4% 0.4% 21.4% 10.8% 0.0% 43.0%
tps kB_read/s kB_wrtn/s kB_read kB_wrtn Device
11.14 175.3k 0.0k 16.6M 0.0k sda
235.07 6.6M 2.9M 641.9M 286.4M sdb
769.76 789.5k 0.0k 75.0M 0.0k loop0
Agustin Barrachina
(241 rep)
Oct 8, 2018, 09:51 AM
• Last activity: May 13, 2025, 10:03 PM
2
votes
1
answers
10085
views
CPU reservation and affinity using taskset and isolcpus kernel parameter with JVM?
We need for the JVM to reserve a set number of CPUs. Following my research we can use `taskset` along with the kernel parameter `isolcpus= ` so that no other process uses this CPU. A few questions arise: - does the process need to be started with `taskset`? - does the reservation means that the proc...
We need for the JVM to reserve a set number of CPUs. Following my research we can use
taskset
along with the kernel parameter isolcpus=
so that no other process uses this CPU.
A few questions arise:
- does the process need to be started with taskset
?
- does the reservation means that the process can only run on that CPU and if there are resources problems it can expand to the other CPUs?
danidar
(201 rep)
Jul 26, 2018, 03:52 PM
• Last activity: Apr 30, 2025, 09:06 PM
1
votes
1
answers
128
views
How to measure actual CPU utilization in Linux for multi core applications?
I have a computation intensive process that I need to run multiple times on a multi-core processor but "top" isn't showing utilization or load in a useful way. For example, imagine my task runs in 1 minute in a single thread on a single core of my six core, 12 thread, SMT CPU. If I start the same ta...
I have a computation intensive process that I need to run multiple times on a multi-core processor but "top" isn't showing utilization or load in a useful way.
For example, imagine my task runs in 1 minute in a single thread on a single core of my six core, 12 thread, SMT CPU. If I start the same task six times using six threads, it still finishes in 1 minute and top shows the load average as 6.0 and the cpu(s) at 50% us and 50% id. In the top process list, each of the six processes is showing 100% CPU. If I do the same thing but start 12 threads, it finishes the 12 jobs in 2 minutes and top shows the load average as 12.0, cpu(s) at 100% us 0% id, with 12 processes each at 100% CPU.
Now, the 6 thread and 12 thread examples are both processing at the same fully loaded rate of completing 1/6 job per minute but why does top show the 6-thread case being 50% idle when clearly it isn't? Is there a better way of determining the actual load of the CPUs?
This was run on a Ryzen 5600X processor on Ubuntu 24.12.
Edit: top output for 12 tasks:
top - 08:35:37 up 54 days, 20:49, 3 users, load average: 12.20, 6.70, 2.80
Tasks: 346 total, 13 running, 332 sleeping, 0 stopped, 1 zombie
%Cpu(s): 98.2 us, 1.7 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.1 si, 0.0 st
MiB Mem : 64221.7 total, 1572.7 free, 4983.4 used, 58684.1 buff/cache
MiB Swap: 8192.0 total, 7863.7 free, 328.3 used. 59238.3 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2249765 user 20 0 126952 64132 51200 R 100.0 0.1 3:48.87 sonicLiquidFoam
2249759 user 20 0 127060 64220 51200 R 100.0 0.1 3:48.93 sonicLiquidFoam
2249757 user 20 0 126624 64064 51328 R 100.0 0.1 3:49.32 sonicLiquidFoam
2249761 user 20 0 128276 64868 50688 R 100.0 0.1 3:47.65 sonicLiquidFoam
2249762 user 20 0 127652 63688 50432 R 100.0 0.1 3:49.13 sonicLiquidFoam
2249755 user 20 0 128844 66128 51200 R 100.0 0.1 3:46.06 sonicLiquidFoam
2249766 user 20 0 126576 63952 51328 R 100.0 0.1 3:47.87 sonicLiquidFoam
2249764 user 20 0 126612 63824 51072 R 99.0 0.1 3:48.59 sonicLiquidFoam
2249760 user 20 0 126888 63972 51072 R 98.7 0.1 3:45.06 sonicLiquidFoam
2249758 user 20 0 127500 64860 51200 R 97.7 0.1 3:48.64 sonicLiquidFoam
2249763 user 20 0 127916 64944 51072 R 97.0 0.1 3:39.58 sonicLiquidFoam
2249756 user 20 0 126828 63948 51072 R 96.0 0.1 3:48.77 sonicLiquidFoam
For 6 tasks:
top - 08:40:22 up 54 days, 20:53, 3 users, load average: 6.11, 6.67, 3.90
Tasks: 335 total, 7 running, 327 sleeping, 0 stopped, 1 zombie
%Cpu(s): 50.0 us, 1.0 sy, 0.0 ni, 49.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 64221.7 total, 1616.2 free, 4914.6 used, 58710.3 buff/cache
MiB Swap: 8192.0 total, 7863.7 free, 328.3 used. 59307.1 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2250032 user 20 0 127392 64676 51200 R 100.0 0.1 2:39.15 sonicLiquidFoam
2250027 user 20 0 126828 63096 50176 R 100.0 0.1 2:39.23 sonicLiquidFoam
2250028 user 20 0 127060 63260 50176 R 100.0 0.1 2:39.23 sonicLiquidFoam
2250029 user 20 0 128844 66124 51200 R 100.0 0.1 2:39.12 sonicLiquidFoam
2250030 user 20 0 128276 65508 51200 R 100.0 0.1 2:39.21 sonicLiquidFoam
2250031 user 20 0 126596 63808 51072 R 100.0 0.1 2:39.21 sonicLiquidFoam
tkw954
(113 rep)
Apr 23, 2025, 07:35 PM
• Last activity: Apr 24, 2025, 02:43 PM
0
votes
0
answers
33
views
CPU affinity not following cpuset
When I run `taskset -p ` of a process I am getting something like this back: ``` # taskset -p 1078 pid 1078's current affinity mask: 3f ``` And it keeps changing what it reports, sometimes it's 5f, other times df and so on. For the same process I can see that its allowed on all cores: ``` # cat /pro...
When I run
taskset -p
of a process I am getting something like this back:
# taskset -p 1078
pid 1078's current affinity mask: 3f
And it keeps changing what it reports, sometimes it's 5f, other times df and so on.
For the same process I can see that its allowed on all cores:
# cat /proc/1078/status | grep Cpus
Cpus_allowed: ff
Cpus_allowed_list: 0-7
And its cpuset in cgroups also allows 0-7:
# cat /dev/cpuset/cpus
0-7
If I try to set it to ff
using taskset I still do not get ff
back:
# taskset -p ff 1078
pid 1078's current affinity mask: 5f
pid 1078's new affinity mask: 5f
What mechanism is overriding the cpuset and taskset affinity? Any way I can force it to actually run on all cores?
This is on Android 13 and kernel 5.15.
Zitrax
(284 rep)
Apr 16, 2025, 11:47 AM
2
votes
1
answers
3009
views
High CPU Usage from systemd-udevd
I have a dell studio 1569 and just installed linux onto it. I noticed that the cpu has been running high due to systemd-udevd. Going though different posts on the web including [this one,][1] I used "udevadm monitor" to help narrow down what was happening, and here is the output: [
/devices/pci0000:00/0000:00:1d.0/usb2/2-1/2-1.6/2-1.6.2/2-1.6.2:1.0 (usb)
, then long story short, while I had udevadm monitor
running, I pushed some keys on my keyboard and noticed that the path was the same for the keyboard (as seen in the picture above), only difference was the beginning of the line had aKERNEL
in front of it instead of KERNEL
or UDEV
.
My next test was while I had udevadm monitor
running, I took apart my laptop and disconnected the keyboard to see if those bind/unbind
entrys would stop. But they continued, which makes me now think this is not the keyboard. Can someone know of what else it could be if it is not the keyboard?
Here is the output from lsusb -t
:

Kayracer
(31 rep)
Jun 22, 2019, 02:43 AM
• Last activity: Apr 14, 2025, 10:04 PM
0
votes
0
answers
42
views
Kernel APIs to disable/enable CPU cores within a driver module
I have a linux kernel version 6.8.0-57-generic (Ubuntu Jammy). I want try disabling and enabling the cores in the CPU through the linux kernel module (from kernel space) What is the right way to do this ? I understand the [cores can be disabled from userspace][1] . However I intend to try doing the...
I have a linux kernel version 6.8.0-57-generic (Ubuntu Jammy).
I want try disabling and enabling the cores in the CPU through the linux kernel module (from kernel space) What is the right way to do this ? I understand the cores can be disabled from userspace . However I intend to try doing the same from a kernel module. Thanks in Advance
I want try disabling and enabling the cores in the CPU through the linux kernel module (from kernel space) What is the right way to do this ? I understand the cores can be disabled from userspace . However I intend to try doing the same from a kernel module. Thanks in Advance
ss22
(13 rep)
Apr 13, 2025, 05:14 PM
1
votes
0
answers
47
views
ARM64 commands take seconds to finish
I'm on an ARM board running Linux. The hardware is a vehicle domain control board which has 6 core ARM Cortex-A78AE and some machine learning cores. I don't want to reboot it because It might be a hardware or driver bug? which might be the reason that causes my performance loss issue. ```none root@h...
I'm on an ARM board running Linux. The hardware is a vehicle domain control board which has 6 core ARM Cortex-A78AE and some machine learning cores. I don't want to reboot it because It might be a hardware or driver bug? which might be the reason that causes my performance loss issue.
As you can see,
What should I monitor to find out what happened?
root@hobot:~# uname -a
Linux hobot 6.1.94-rt33 #1 SMP PREEMPT_RT Fri Nov 8 15:11:35 CST 2024 aarch64 GNU/Linux
I don't know what happend with my OS today, I just suddenly found out that shell commands takes too long to finish. but everything was fine a little while ago.
like:

ls
takes about over 5 seconds and it takes 100% CPU in core 4th.
I tried strace
with ls
. It get stuck in nothing.

Xingx1
(11 rep)
Feb 23, 2025, 06:22 AM
• Last activity: Feb 23, 2025, 07:45 AM
0
votes
2
answers
258
views
Process called lsof using too much CPU
I keep having ```sh lsof -w -l +d /var/lib/php5 ``` eating up my CPU I want to know who is triggering it and what does it have with php5 ...
I keep having
lsof -w -l +d /var/lib/php5
eating up my CPU
I want to know who is triggering it and what does it have with php5 ...
Aleksandar Pavić
(109 rep)
Dec 17, 2024, 09:50 AM
• Last activity: Jan 16, 2025, 03:21 PM
1
votes
1
answers
214
views
What exactly does %wait in pidstat mean?
Environment Ubuntu22.04 sysstat version 12.2.0 Number of logical CPUs 16 man pidstat shows the following, but I would like to know more specifically about the denominator and numerator of %wait. > Percentage of CPU spent by the task while waiting to run. One time I was looking at the performance of...
Environment
Ubuntu22.04
sysstat version 12.2.0
Number of logical CPUs 16
man pidstat shows the following, but I would like to know more specifically about the denominator and numerator of %wait.
> Percentage of CPU spent by the task while waiting to run.
One time I was looking at the performance of each process in pidstat and found that there were times when the %wait of one process was quite high, such as 90%. At the same time, %usr was also high and sometimes exceeded 100%.
At that time, the total CPU utilization was close to 100%, and the total %CPU of all processes was close to 1600%, which is the number of CPU cores. However, the total %wait of all processes was more than 1000%. Since CPU resources are used only for %CPU, I assume that %wait is the time when CPU is not used, but what is this percentage?
In mpstat, for example, the total of %usr, %sys, %iowait, %idle, etc. would be about 100% of CPU resources. What exactly is the %wait in the case of pidstat?
If the process is simply not running, I think %wait would also be 0. It is also curious that during the times when %wait was high, disk i/o and net i/o were not high when examined by dstat and other means.
From reading the systat code, it seems that %wait, like %usr and %sys, uses as its numerator the cumulative increase in the amount of time that corresponds to wait at a given point in time, and as its denominator the period of time compared to that point in time. I am not sure what kind of time can be considered as %wait.
LAPK
(11 rep)
Jan 15, 2025, 11:13 PM
• Last activity: Jan 16, 2025, 12:15 AM
0
votes
0
answers
29
views
Linux kernel cgroup v2 CFS - cpu throttled_usec accounting?
In Linux kernel cgroup v2’s CFS scheduler, how is cpu.stat `throttled_usec` accounted when a cgroup with multiple threads gets throttled during a single quota period? Specifically, is `throttled_usec` tracked as the total wall-clock time that the cgroup was throttled as a whole, or is it a sum of th...
In Linux kernel cgroup v2’s CFS scheduler, how is cpu.stat
throttled_usec
accounted when a cgroup with multiple threads gets throttled during a single quota period? Specifically, is throttled_usec
tracked as the total wall-clock time that the cgroup was throttled as a whole, or is it a sum of the throttled times of all individual threads?
Kernel Version: "5.14.0-284.11.1.el9_2.x86_64 #1 SMP PREEMPT_DYNAMIC Tue May 9 11:41:53 PDT 2023 x86_64 x86_64 x86_64 GNU/Linux" Distro: Oracle Linux 9.x
ALZ
(961 rep)
Jan 11, 2025, 12:35 PM
13
votes
5
answers
57255
views
setroubleshootd excessive cpu and memory usage
I have Centos 7 fresh install and I see setroubleshootd with high CPU usage. How can I fix this? What is this process doing?
I have Centos 7 fresh install and I see setroubleshootd with high CPU usage. How can I fix this? What is this process doing?
stiv
(1691 rep)
Mar 29, 2019, 08:41 PM
• Last activity: Dec 9, 2024, 11:31 PM
0
votes
0
answers
33
views
What is the meaning of columns in the table displayed by cpupower-monitor?
Running `sudo cpupower monitor` on Ubuntu 24.04 I'm getting: ``` | Nehalem || Mperf || RAPL || Idle_Stats CPU| C3 | C6 | PC3 | PC6 || C0 | Cx | Freq || pack | core | unco || POLL | C1_A | C2_A | C3_A 0| 0.00| 0.46| 0.00| 0.00|| 11.71| 88.29| 3060||37496059|27532217| 73852|| 0.01| 8.94| 79.93| 0.00 1...
Running
Additionally, why logical CPUs of the same physical core show different frequencies?
sudo cpupower monitor
on Ubuntu 24.04 I'm getting:
| Nehalem || Mperf || RAPL || Idle_Stats
CPU| C3 | C6 | PC3 | PC6 || C0 | Cx | Freq || pack | core | unco || POLL | C1_A | C2_A | C3_A
0| 0.00| 0.46| 0.00| 0.00|| 11.71| 88.29| 3060||37496059|27532217| 73852|| 0.01| 8.94| 79.93| 0.00
1| 0.00| 0.46| 0.00| 0.00|| 6.56| 93.44| 2790||37496059|27532217| 73852|| 0.00| 2.56| 53.19| 37.97
2| 0.00| 4.67| 0.00| 0.00|| 9.14| 90.86| 3194||37496059|27532217| 73852|| 0.01| 4.43| 58.09| 28.77
3| 0.00| 4.67| 0.00| 0.00|| 6.24| 93.76| 3239||37496059|27532217| 73852|| 0.00| 1.69| 34.62| 57.66
4| 0.00| 0.00| 0.00| 0.00|| 0.29| 99.71| 4461||37496059|27532217| 73852|| 0.00| 0.00| 0.00| 99.97
5| 0.00| 0.00| 0.00| 0.00|| 29.52| 70.48| 4457||37496059|27532217| 73852|| 0.00| 64.48| 0.00| 6.48
6| 0.00| 0.00| 0.00| 0.00|| 47.92| 52.08| 4123||37496059|27532217| 73852|| 0.00| 0.95| 0.31| 51.40
7| 0.00| 0.00| 0.00| 0.00|| 0.00|100.00| 3753||37496059|27532217| 73852|| 0.00| 0.00| 0.00| 99.99
8| 0.00| 25.03| 0.00| 0.00|| 3.27| 96.73| 2807||37496059|27532217| 73852|| 0.01| 4.13| 54.72| 38.03
9| 0.00| 62.12| 0.00| 0.00|| 2.32| 97.68| 2897||37496059|27532217| 73852|| 0.00| 1.29| 30.79| 65.69
10| 0.00| 77.85| 0.00| 0.00|| 2.19| 97.81| 3064||37496059|27532217| 73852|| 0.00| 0.94| 18.42| 78.51
11| 0.00| 0.00| 0.00| 0.00|| 14.84| 85.16| 2497||37496059|27532217| 73852|| 0.01| 57.88| 27.45| 0.38
12| 0.00| 70.55| 0.00| 0.00|| 3.27| 96.73| 2399||37496059|27532217| 73852|| 0.00| 1.33| 30.14| 65.41
13| 0.00| 54.45| 0.00| 0.00|| 4.02| 95.98| 2213||37496059|27532217| 73852|| 0.00| 1.06| 43.49| 51.62
14| 0.00| 67.90| 0.00| 0.00|| 3.36| 96.64| 2334||37496059|27532217| 73852|| 0.00| 1.34| 30.91| 64.54
15| 0.00| 72.39| 0.00| 0.00|| 2.41| 97.59| 2167||37496059|27532217| 73852|| 0.00| 1.25| 26.74| 69.73
What is the meaning of columns in this table? cpupower
manual (https://linux.die.net/man/1/cpupower-monitor) doesn't have that information. I'm assuming that C3
, C6
, etc. are percentages of time spent in a given CPU C-state.
Also, I'm running a program with two threads pinned to CPU #5 and #6. Cores #2 and #3 containing hyperthreaded CPUs 4-7 are isolated with GRUB_CMDLINE_LINUX="nohz=on nohz_full=4-7 rcu_nocbs=4-7 isolcpus=4-7 irqaffinity=0-3,8-15"
. The table above confirms load on CPU #5 and #6, but also consistently shows some marginal load on CPU #4. It doesn't correspond to the screen of Ubuntu System Monitor, which consistently shows 0% load on CPU #4 listed as CPU5 here:

Paul Jurczak
(151 rep)
Nov 23, 2024, 03:38 AM
• Last activity: Nov 23, 2024, 06:46 AM
1
votes
1
answers
158
views
100% CPU on 4 of 8 cores on Oracle Linux
I have a desktop computer (Intel i4770) running Oracle Linux 7.9 with kernel 4.1.12-61. I usually keep it off and only turn it on on the rare occasions when I need to test something. A month or so ago, I turned it on and noticed that the fans were on max speed - I checked top and found that setroubl...
I have a desktop computer (Intel i4770) running Oracle Linux 7.9 with kernel 4.1.12-61. I usually keep it off and only turn it on on the rare occasions when I need to test something. A month or so ago, I turned it on and noticed that the fans were on max speed - I checked top and found that setroubleshoot was at 100% so I killed the process. The process kept coming back and I kept killing it but ultimately, it didn't matter much because my testing was done and I turned the computer off again. (Yes, I always shut it down the right way.)
Now trying to get to the root of the problem, setroubleshoot is no longer showing 100% in top. In fact, nothing is even close to 100%. Running htop, I can get details about the CPUs and 4 of the 8 cores are permanently 100%. From the time the computer lets me log in to when I shut it down. But there's nothing in the list of processes even above 5.2%.
When I run perf on each core with
Here is the perf report from running
On the desktop of this computer, I noticed a bunch of SELinux errors. They all appear to be saying that there was an attempt to execute something that should not have been allowed.
And just to confirm that htop was right about what it was reporting, here's the report from the System Monitor.
Booting into a prior version of the kernel didn't help. And booting into the "rescue" kernel didn't help either.
I tried updating the kernel but that didn't help. I ran a software update and that didn't help either. Note that I hadn't done any updates or installed any new software immediately prior to this problem starting. This install had been stable for years when I needed it.
I also tried installing the same OS over again on a new external drive. That worked. No issues on that drive. But when I boot to that drive and then choose the kernel that is on the main drive, the problem returns. That all seems to prove that the kernel isn't the issue but the system main drive has something wrong.
I'm at a loss for how to debug further. I can't figure out what changed and why so I don't know how to even start fixing it. Any help about where to look and what to check would be appreciated!
___ Edit 1: ___
Output from
The number in the list above, 1399, is the PID for the process but that process isn't visible either through

perf top -C 1 --sort comm
, I can see that cores zero through 3 are all 100% kernel.

perf record -a -F 999 -- sleep 10
. I don't know if the failure to find useful symbols is indicative of the problem I'm chasing, if it is a different issue that I'll need help figure out, or if it is something that should be ignored.



ps -efl|sort -rk14|head
:
F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME CMD
0 S gdm 2085 2041 2 80 0 - 905731 - 17:42 ? 00:00:02 /usr/bin/gnome-shell
4 S root 1 0 1 80 0 - 54811 - 17:41 ? 00:00:01 /usr/lib/systemd/systemd --switched-root --system --deserialize 22
4 S root 837 1 1 80 0 - 22671 - 17:42 ? 00:00:01 /sbin/rngd -f
1 S root 405 2 0 60 -20 - 0 - 17:41 ? 00:00:00 [xfs_mru_cache]
1 S root 407 2 0 60 -20 - 0 - 17:41 ? 00:00:00 [xfs-data/sda1]
1 S root 408 2 0 60 -20 - 0 - 17:41 ? 00:00:00 [xfs-conv/sda1]
1 S root 409 2 0 60 -20 - 0 - 17:41 ? 00:00:00 [xfs-cil/sda1]
1 S root 406 2 0 60 -20 - 0 - 17:41 ? 00:00:00 [xfs-buf/sda1]
1 S root 404 2 0 60 -20 - 0 - 17:41 ? 00:00:00 [xfsalloc]
output from dmesg | grep libsystem
[ 3.344918] audit: type=1400 audit(1731105719.768:4): avc: denied { execute } for pid=496 comm="systemd-journal" path="/usr/local/lib/libsystem.so" dev="sda1" ino=136977390 scontext=system_u:system_r:syslogd_t:s0 tcontext=unconfined_u:object_r:user_tmp_t:s0 tclass=file permissive=0
[ 3.351928] audit: type=1400 audit(1731105719.775:6): avc: denied { execute } for pid=502 comm="systemd-readahe" path="/usr/local/lib/libsystem.so" dev="sda1" ino=136977390 scontext=system_u:system_r:readahead_t:s0 tcontext=unconfined_u:object_r:user_tmp_t:s0 tclass=file permissive=0
[ 3.351929] audit: type=1400 audit(1731105719.775:5): avc: denied { execute } for pid=503 comm="systemd-readahe" path="/usr/local/lib/libsystem.so" dev="sda1" ino=136977390 scontext=system_u:system_r:readahead_t:s0 tcontext=unconfined_u:object_r:user_tmp_t:s0 tclass=file permissive=0
[ 3.374010] audit: type=1400 audit(1731105719.797:7): avc: denied { execute } for pid=513 comm="systemd-tmpfile" path="/usr/local/lib/libsystem.so" dev="sda1" ino=136977390 scontext=system_u:system_r:systemd_tmpfiles_t:s0 tcontext=unconfined_u:object_r:user_tmp_t:s0 tclass=file permissive=0
[ 3.386383] audit: type=1400 audit(1731105719.810:8): avc: denied { execute } for pid=525 comm="systemd-sysctl" path="/usr/local/lib/libsystem.so" dev="sda1" ino=136977390 scontext=system_u:system_r:systemd_sysctl_t:s0 tcontext=unconfined_u:object_r:user_tmp_t:s0 tclass=file permissive=0
[ 3.397457] audit: type=1400 audit(1731105719.821:9): avc: denied { execute } for pid=536 comm="hostname" path="/usr/local/lib/libsystem.so" dev="sda1" ino=136977390 scontext=system_u:system_r:hostname_t:s0 tcontext=unconfined_u:object_r:user_tmp_t:s0 tclass=file permissive=0
[ 3.673352] audit: type=1400 audit(1731105720.097:10): avc: denied { execute } for pid=657 comm="alsactl" path="/usr/local/lib/libsystem.so" dev="sda1" ino=136977390 scontext=system_u:system_r:alsa_t:s0-s0:c0.c1023 tcontext=unconfined_u:object_r:user_tmp_t:s0 tclass=file permissive=0
___ Edit 2: ___
I installed kernel debug info and downgraded perf to version 3 since apparently there is a bug with the perf version for OL7.9 . However there are still symbols that can't be found.

htop
or ps
. As soon as I did a kill -9 1399
, the CPU usage immediately dropped to zero. That's nice because at least the problem process is now dead. And I know how to kill it, even though I don't see it in the normal process lists.
But the fundamental question remains - where is this process coming from and how do I stop it from starting in the first place!?
ktbos
(111 rep)
Nov 8, 2024, 07:24 PM
• Last activity: Nov 11, 2024, 05:30 PM
0
votes
1
answers
118
views
Why is the Linux CPU stalling when having multithreaded memory writes?
HW Specs: - CPU: 64 Cores, 128 Threads, AMD Ryzen Threadripper Pro 5995WX - RAM: 512 GB, Manufacturer unknown, will try to provide if needed Linux Specs: - OS: Ubuntu 22.04.4 LTS - Linux Kernel: 5.15.0-119-generic I'm trying to get model training to work with pytorch on a Linux server, where I have...
HW Specs:
- CPU: 64 Cores, 128 Threads, AMD Ryzen Threadripper Pro 5995WX
- RAM: 512 GB, Manufacturer unknown, will try to provide if needed
Linux Specs:
- OS: Ubuntu 22.04.4 LTS
- Linux Kernel: 5.15.0-119-generic
I'm trying to get model training to work with pytorch on a Linux server, where I have observed performance degradation of a factor ~10 after letting a resource intensive training task run for a couple of minutes (Training on 4 GPUs having a multithreaded Dataloader each).
Trying to isolate the root cause for this issue, I have now come up with a minimal test in python reproducing the issue, by continuously writing 1GB of data to RAM. Running this with 32 Threads in parallel (CPU has 128 Threads available) the CPU stalls after 0%) until giving it some cooldown time of approx 1min. I have run the test on another server (48 CPU threads, 160GB RAM) for 10 minutes without any problems (On this server multi-GPU training is also running without any performance degradation).
Opposed to the self-implemented memory write test, I have also tried a benchmark test using
sysbench
, writing 10TB of data with up to 96 Threads without any problem. This is where I don't really understand the difference, whether this task writes the data only in some sort of buffer without really allocating any RAM memory? I ran the test with the follwing command:
sysbench --threads=96 --time=0 --memory-block-size=128K --memory-total-size=10T --report-interval=1 --memory-oper=write memory run
The main observable difference of sysbench to my python test script was in htop, where sysbench had all threads running as normal priority/user threads (green bars) while my python script caused a large portion being kernel time (red bars), in my understanding caused by a lot of wait time required.
My question now is, does this diagnostic give some indication on why the system is stalling? Might there be a hardware issue with RAM or could this be an issue with the OS? Or what further tests could I do to isolate the root cause?
---
Edit:
In the following you can find the minimal python script:
import time
import numpy as np
import threading
data = np.zeros((1024, 1024, 1024, 1), dtype=np.uint8)
def allocate_memory():
while True:
start_time = time.time()
_ = data * 0
end_time = time.time()
print(f"Time: {end_time - start_time:.3f} s")
print(data.shape)
def run_in_threads(num_threads):
threads = []
for _ in range(num_threads):
thread = threading.Thread(target=allocate_memory)
thread.start()
threads.append(thread)
for thread in threads:
thread.join()
if __name__ == "__main__":
num_threads = 32
run_in_threads(num_threads)
m4fr1699
(1 rep)
Nov 5, 2024, 08:27 AM
• Last activity: Nov 7, 2024, 08:58 AM
1
votes
1
answers
329
views
Btop - What is the LAV value mean?
I have a KVM guest and inside it I am running vnc server. I have a ssh tunnel that I connect to it using tigervnc viewer. Everything works good for it except one issue. Using Chrome (using X server), when I vertically scroll on a website it is a bit laggy. I checked the cpu and it looks good, I neve...
I have a KVM guest and inside it I am running vnc server. I have a ssh tunnel that I connect to it using tigervnc viewer. Everything works good for it except one issue. Using Chrome (using X server), when I vertically scroll on a website it is a bit laggy. I checked the cpu and it looks good, I never see it at 100%... at most 50%. Memory is good to... I usually have about 30 gigs free of ram.
However, I see the first LAV value is 1.88 on btop. What does that mean exactly? Does that mean 100 percent of the cpu is being used and 88 percent of processes are waiting?

dman
(569 rep)
Oct 14, 2024, 05:04 PM
• Last activity: Oct 14, 2024, 05:18 PM
-1
votes
1
answers
37
views
(Solaris) Ram CPU monitoring script is grabbing incorrect cpu ram utilization values
```#!/bin/bash host=$(hostname) email="abc.@xyz.com" # Change to your desired email subject="Attention!!! Health check Failed on $host" echo $(date) # CPU use threshold cpu_threshold=0 # Memory idle threshold mem_threshold=0 #--- CPU cpu_usage () { # Get CPU idle percentage cpu_idle=$(prstat -Z 1 1...
#!/bin/bash
host=$(hostname)
email="abc.@xyz.com" # Change to your desired email
subject="Attention!!! Health check Failed on $host"
echo $(date)
# CPU use threshold
cpu_threshold=0
# Memory idle threshold
mem_threshold=0
#--- CPU
cpu_usage () {
# Get CPU idle percentage
cpu_idle=$(prstat -Z 1 1 | awk 'NR==2 {print $8}' | tr -d '%')
# Check if cpu_idle is a valid number
if ! [[ "$cpu_idle" =~ ^[0-9]+$ ]]; then
echo "Error: Invalid CPU idle value: $cpu_idle"
cpu_use=0
else
cpu_use=$((100 - cpu_idle))
fi
cpu_flag=0
echo "CPU utilization: $cpu_use%"
if [ "$cpu_use" -gt "$cpu_threshold" ]; then
echo "CPU warning!!!"
cpu_flag=1
else
echo "CPU ok!!!"
fi
}
#--- Memory
mem_usage () {
mem_total=$(kstat -m | grep "physmem" | awk '{print $2}') # Total memory in bytes
mem_free=$(kstat -m | grep "freemem" | awk '{print $2}') # Free memory in bytes
if [[ -z "$mem_total" || "$mem_total" -eq 0 ]]; then
echo "Failed to retrieve total memory."
mem_total=0
mem_free=0
fi
# Convert bytes to GB
mem_total_gb=$((mem_total / 1024 / 1024))
mem_free_gb=$((mem_free / 1024 / 1024))
echo "Total memory: $mem_total_gb GB"
echo "Free memory: $mem_free_gb GB"
if [ "$mem_total" -gt 0 ]; then
per_mem=$(( ((mem_total - mem_free)) * 100 / mem_total ))
else
per_mem=0
fi
echo "Memory space remaining: $mem_free_gb GB"
mem_flag=0
if [ "$per_mem" -gt "$mem_threshold" ]; then
echo "Memory warning!!!"
mem_flag=1
else
echo "Memory ok!!!"
fi
}
out() {
if [ "$cpu_flag" -eq 1 ] && [ "$mem_flag" -eq 1 ]; then
printf "
Hello Team,
Please check RAM and CPU utilization on $host.
Current RAM Percentage: $per_mem%%
Current CPU Percentage: $cpu_use%%
Thanks " > /tmp/health.txt
elif [ "$cpu_flag" -eq 1 ]; then
printf "
Hello Team,
Please check CPU utilization on $host.
Current CPU Percentage: $cpu_use%%
Thanks " > /tmp/health.txt
elif [ "$mem_flag" -eq 1 ]; then
printf "
Hello Team,
Please check RAM utilization on $host.
Current RAM Percentage: $per_mem%%
Thanks " > /tmp/health.txt
fi
}
mail() {
if [ "$cpu_flag" -eq 1 ] || [ "$mem_flag" -eq 1 ]; then
/usr/sbin/sendmail -t <
It gives the following output. Despite using kstat, it throws this error. What am I doing wrong here?
Fri Oct 11 06:51:21 CDT 2024
Error: Invalid CPU idle value: 17:25:22
CPU utilization: 0%
CPU ok!!!
Usage:
kstat [ -qlp ] [ -T d|u ] [ -c class ]
[ -m module ] [ -i instance ] [ -n name ] [ -s statistic ]
[ interval [ count ] ]
kstat [ -qlp ] [ -T d|u ] [ -c class ]
[ module:instance:name:statistic ... ]
[ interval [ count ] ]
Usage:
kstat [ -qlp ] [ -T d|u ] [ -c class ]
[ -m module ] [ -i instance ] [ -n name ] [ -s statistic ]
[ interval [ count ] ]
kstat [ -qlp ] [ -T d|u ] [ -c class ]
[ module:instance:name:statistic ... ]
[ interval [ count ] ]
Failed to retrieve total memory.
Total memory: 0 GB
Free memory: 0 GB
Memory space remaining: 0 GB
Memory ok!!!
The CPU utilization is getting grabbed correctly now, only the memory usage is not reported correctly. I am not sure which module to use with kstat command, so I used prtconf as well but that didn't capture it either.
Navdeep Singh
(37 rep)
Oct 11, 2024, 11:59 AM
• Last activity: Oct 11, 2024, 01:02 PM
Showing page 1 of 20 total questions