Sample Header Ad - 728x90

Unix & Linux Stack Exchange

Q&A for users of Linux, FreeBSD and other Unix-like operating systems

Latest Questions

7 votes
1 answers
4888 views
Very High CPU Usage By IRQ #16
I recently noticed that one of my CPUs was idling at around 85-90% and according to `top` the usage was coming from interrupts, so [just like in this question][1] I used a combination of `dmesg` and periodically `cat`ing `/proc/interrupts` and found out this: CPU0 CPU1 CPU2 CPU3 0: 17 0 0 0 IR-IO-AP...
I recently noticed that one of my CPUs was idling at around 85-90% and according to top the usage was coming from interrupts, so just like in this question I used a combination of dmesg and periodically cating /proc/interrupts and found out this: CPU0 CPU1 CPU2 CPU3 0: 17 0 0 0 IR-IO-APIC 2-edge timer 1: 11548 0 2429 0 IR-IO-APIC 1-edge i8042 8: 0 0 0 1 IR-IO-APIC 8-edge rtc0 9: 7 16 0 0 IR-IO-APIC 9-fasteoi acpi 12: 14530 108887 0 0 IR-IO-APIC 12-edge i8042 16: 78464100 0 0 11702812 IR-IO-APIC 16-fasteoi idma64.0, i2c_designware.0, i801_smbus 120: 0 0 0 0 DMAR-MSI 0-edge dmar0 121: 0 0 0 0 DMAR-MSI 1-edge dmar1 As you can see, IRQ #16 is sending interrupts like crazy (every time the CPU wakes up from S3 it seems to start spamming a different CPU), I also found out that my touchpad uses the same IRQ and if the I2C mode is enabled (or *advanced* mode, according to my BIOS), it randomly stops working with the following messages (from dmesg): [ 167.851139] irq 16: nobody cared (try booting with the "irqpoll" option) [ 167.851158] CPU: 2 PID: 3874 Comm: firefox Not tainted 4.15.3-300.fc27.x86_64 #1 [ 167.851160] Hardware name: Acer Aspire E5-575/Ironman_SK , BIOS V1.04 04/26/2016 [ 167.851162] Call Trace: [ 167.851171] [ 167.851185] dump_stack+0x5c/0x85 [ 167.851193] __report_bad_irq+0x30/0xc0 [ 167.851196] note_interrupt+0x235/0x280 [ 167.851198] handle_irq_event_percpu+0x51/0x70 [ 167.851201] handle_irq_event+0x27/0x50 [ 167.851204] handle_fasteoi_irq+0x6b/0x120 [ 167.851209] handle_irq+0xaf/0x120 [ 167.851214] do_IRQ+0x41/0xc0 [ 167.851219] common_interrupt+0xa2/0xa2 [ 167.851222] [ 167.851224] RIP: 0010:_raw_spin_lock+0x10/0x20 [ 167.851226] RSP: 0000:ffffa85a857dfdd0 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffdb [ 167.851230] RAX: 0000000000000000 RBX: ffff8d0a268930c8 RCX: 00003ffffffff000 [ 167.851231] RDX: 0000000000000001 RSI: 8000000000000025 RDI: ffffd21648d7ca70 [ 167.851232] RBP: ffffd2164892e100 R08: 0000000000000000 R09: 0000000000171800 [ 167.851233] R10: 0000000000271800 R11: 0000000000001000 R12: 0000000000000000 [ 167.851234] R13: 8000000224b84867 R14: ffffd21648d7ca70 R15: ffff8d0a35f29810 [ 167.851244] __handle_mm_fault+0xa4c/0x1290 [ 167.851249] handle_mm_fault+0xaa/0x1f0 [ 167.851255] __do_page_fault+0x25d/0x4e0 [ 167.851262] ? SyS_mmap_pgoff+0xfb/0x250 [ 167.851264] do_page_fault+0x32/0x110 [ 167.851267] ? page_fault+0x36/0x60 [ 167.851269] page_fault+0x4c/0x60 [ 167.851272] RIP: 0033:0x7ff86dc0b205 [ 167.851273] RSP: 002b:00007ffe6493e888 EFLAGS: 00010206 [ 167.851276] handlers: [ 167.851291] [] idma64_irq [idma64] [ 167.851296] [] i2c_dw_isr [ 167.851302] [] i801_isr [i2c_i801] [ 167.851304] Disabling IRQ #16 Is this a hardware issue? What can I do? ---------- Finally I have a chance to dig more into this, by running lspci -nnkv I found out 2 devices that are using IRQ 16: 00:15.0 Signal processing controller : Intel Corporation Sunrise Point-LP Serial IO I2C Controller #0 [8086:9d60] (rev 21) Subsystem: Acer Incorporated [ALI] Device [1025:1094] Flags: fast devsel, IRQ 16 Memory at a132b000 (64-bit, non-prefetchable) [size=4K] Capabilities: Power Management version 3 Capabilities: Vendor Specific Information: Len=14 Kernel driver in use: intel-lpss Kernel modules: intel_lpss_pci and: 00:1f.4 SMBus [0c05]: Intel Corporation Sunrise Point-LP SMBus [8086:9d23] (rev 21) Subsystem: Acer Incorporated [ALI] Device [1025:1094] Flags: medium devsel, IRQ 16 Memory at a132e000 (64-bit, non-prefetchable) [size=256] I/O ports at 4040 [size=32] Kernel driver in use: i801_smbus Kernel modules: i2c_i801 The problem seems to go away if I unload the intel_lpss_pci module i.e. rmmod intel_lpss_pci, but of course the touchpad would stop working. But I guess it's better than having a CPU always at 100%.
arielnmz (559 rep)
Mar 1, 2018, 05:25 AM • Last activity: Jun 26, 2025, 02:05 PM
3 votes
1 answers
1988 views
Remove the new kernel
I was working without any problem and one day the computer didn't start he said that there is a problem in the Xserver , so I connected to my computer through ssh and reinstalled xserver-xorg ,xserver-xorg-core then The computer start working but when I tap the touchpad it's don't click (it's not a...
I was working without any problem and one day the computer didn't start he said that there is a problem in the Xserver , so I connected to my computer through ssh and reinstalled xserver-xorg ,xserver-xorg-core then The computer start working but when I tap the touchpad it's don't click (it's not a configuration problem) so i copied the drivers from another linux mint and copied them into mine , and in that moment i found that i have the kernel 4.2.0-32 installed but it's not in my friend computer then I tried to uninstall it but it was impossible then I've installed the kernel 4.4.0-22 now the the touchpad is working but youtube in chrome show black video , I've changed something in the configuration about harware acceleration and now it's fine now the computer work without any problem except it's working so slowly I'm just using a browser and i'm using 60% from my CPU **so I want to go back to the kernel 3.19.0-32 because it's recommended and the problem is that I can't uninstall the current kernel because it's loaded so i want to know how can i load another installed kernel to be able to delete the current one ? (Current kernel 4.4.0-22)** *Here i can delete the old kernel because it's not loaded* Image *Here i can't delete the new kernel because it's loaded (I want to delete it)* Image
Khalil Bz (153 rep)
May 8, 2016, 11:00 AM • Last activity: Jun 23, 2025, 04:03 PM
2 votes
1 answers
3240 views
How can I get the total CPU usage of a Linux machine with 1 or n CPU cores?
I am currently using the below method to extract CPU usage idle value from top command and subtracting the value from 100. Is this method correct and is there a better way to achieve the same. Also, my linux VM is a stripped down version and has only few basic tools like `top`. Installing other tool...
I am currently using the below method to extract CPU usage idle value from top command and subtracting the value from 100. Is this method correct and is there a better way to achieve the same. Also, my linux VM is a stripped down version and has only few basic tools like top. Installing other tools is not an option as the package manager is also removed. CPU_IDLE="$(top -bn2 | grep -F '%Cpu' | tail -n 4 | gawk '{print $8 $9}' | tr -s '\n\:\,[:alpha:]' ' '| gawk '{print $2}'),"
Bandi Sandeep (21 rep)
Apr 12, 2017, 06:07 AM • Last activity: May 28, 2025, 12:05 PM
0 votes
0 answers
111 views
MySQL Server keeps hitting 100% CPU
My server has 800+ days uptime, but ever since a couple weeks the **MySQL server** keeps hitting **100% CPU** and I have to restart the `mysqld` service. Where do I start to find out where this issue is coming from? As all the sites on my server keep freezing until I restart. I don't want to keep re...
My server has 800+ days uptime, but ever since a couple weeks the **MySQL server** keeps hitting **100% CPU** and I have to restart the mysqld service. Where do I start to find out where this issue is coming from? As all the sites on my server keep freezing until I restart. I don't want to keep restarting the service every time. Sometimes it is after a couple minutes, sometimes a couple hours, sometimes a couple days. Nothing changed really. I only once updated the my.cnf file for the default character set and restarted recently, but I don't believe this is linked in any way. **--- Server details ---**
Debian 10 Buster
MySQL Version: 14.14
Average users: 150 - 200 per day
vCPU's: 4
RAM: 4GB
**my.cnf** No slow queries shown so far. [mysqld] collation_server = utf8mb4_unicode_ci character_set_server = utf8mb4 sql-mode="NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION" slow_query_log = 1 slow_query_log_file = /var/log/mysql/slow_queries.log log_queries_not_using_indexes = 'OFF' long_query_time = 5
Z0q (631 rep)
May 31, 2024, 03:56 PM • Last activity: May 26, 2025, 05:19 AM
2 votes
1 answers
1940 views
Short periodic freezes every few seconds. Everything except the mouse
My computer freezes every 1 or 2 seconds for a short period as well. So 1 or 2 seconds working and 1 or 2 seconds not working. Everything stops working except for the mouse. ---------- The first time I discovered the problem was when I wanted to open a 1GB txt file with leafpad. The syslog (and othe...
My computer freezes every 1 or 2 seconds for a short period as well. So 1 or 2 seconds working and 1 or 2 seconds not working. Everything stops working except for the mouse. ---------- The first time I discovered the problem was when I wanted to open a 1GB txt file with leafpad. The syslog (and other files) raised to 350MB with leafpad errors. I still don't really think it could be the cause but since then I have noticed it slow. I tried deleting those lines to make the files lighter but didn't work (ofc). The line was a repetition of: localhost leafpad: pango_tab_array_get_tab: assertion 'tab_index >= 0' failed ---------- **Gnome-shell debugging** (In the end I think the problem is not there) I have runned top to see the problem and my first guess was that gnome-shell. I have disabled all extensions on gnome and I have put Hidden=True on the gnome tracker. Reboot ofc but issue still continues. top - 11:37:47 up 16 min, 1 user, load average: 5.08, 4.53, 3.07 Tasks: 186 total, 1 running, 185 sleeping, 0 stopped, 0 zombie %Cpu(s): 5.4 us, 13.6 sy, 0.0 ni, 78.8 id, 2.2 wa, 0.0 hi, 0.0 si, 0.0 st MiB Mem : 11894.0 total, 9255.9 free, 884.7 used, 1753.4 buff/cache MiB Swap: 0.0 total, 0.0 free, 0.0 used. 10597.3 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1467 root 20 0 3828560 258004 73056 S 16.9 2.1 3:00.91 gnome-she+ 1627 root 20 0 384600 23668 17328 S 13.0 0.2 1:31.00 gsd-xsett+ 1732 root 20 0 1190848 66960 31648 S 11.6 0.5 1:21.92 gnome-sof+ 2371 root 20 0 239576 28532 22080 S 9.0 0.2 0:49.61 leafpad 2282 root 20 0 1397692 79500 38488 S 8.3 0.7 2:27.84 nautilus 1618 root 20 0 452484 40448 13752 S 7.6 0.3 1:01.97 packageki+ 1643 root 20 0 384156 24452 17428 S 5.3 0.2 1:16.62 gsd-keybo+ 1636 root 20 0 236512 22152 17128 S 3.0 0.2 1:16.76 gsd-clipb+ 1269 root 20 0 343084 47552 32060 S 0.7 0.4 0:19.31 Xorg 9 root 20 0 0 0 0 I 0.3 0.0 0:01.07 rcu_sched 1176 message+ 20 0 18272 5276 3476 S 0.3 0.0 0:01.51 dbus-daem+ 1640 root 20 0 550896 24776 19364 S 0.3 0.2 1:18.79 gsd-color 2850 root 20 0 527664 39564 28252 S 0.3 0.3 0:07.43 gnome-ter+ 3048 root 20 0 15804 3484 3040 R 0.3 0.0 0:00.01 top 1 root 20 0 192548 9036 6632 S 0.0 0.1 0:02.95 systemd 2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd 3 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 rcu_gp I have used the following to know where the issue was and it seeams openat takes the majority. What also get's my attention is the amount of errors that function gets and I guess that might be the problem. strace -c -p 1467 strace: Process 1467 attached ^Cstrace: Process 1467 detached % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 38.35 2.269925 65 34909 22415 openat 21.63 1.280485 1583 809 252 unlink 18.82 1.113966 4700 237 15 link 16.79 0.993957 4498 221 rename 0.96 0.056549 2 30633 21313 access 0.91 0.053897 3 20006 186 stat 0.47 0.027686 1 19059 read 0.42 0.024586 2 12498 close 0.33 0.019538 2 10852 fstat 0.28 0.016418 5 3083 munmap 0.21 0.012386 4 3099 mmap 0.18 0.010921 21 528 write 0.13 0.007561 1 7413 getuid So I killed the gnome-shell process and still the problem remains. I don't really see what the problem may be and I have a 4 cores intel i7 processor pc so it shouldn't be so demanded. ---------- iostat -h Linux 10/11/2018 _x86_64_ (4 CPU) avg-cpu: %user %nice %system %iowait %steal %idle 24.4% 0.4% 21.4% 10.8% 0.0% 43.0% tps kB_read/s kB_wrtn/s kB_read kB_wrtn Device 11.14 175.3k 0.0k 16.6M 0.0k sda 235.07 6.6M 2.9M 641.9M 286.4M sdb 769.76 789.5k 0.0k 75.0M 0.0k loop0
Agustin Barrachina (241 rep)
Oct 8, 2018, 09:51 AM • Last activity: May 13, 2025, 10:03 PM
2 votes
1 answers
10085 views
CPU reservation and affinity using taskset and isolcpus kernel parameter with JVM?
We need for the JVM to reserve a set number of CPUs. Following my research we can use `taskset` along with the kernel parameter `isolcpus= ` so that no other process uses this CPU. A few questions arise: - does the process need to be started with `taskset`? - does the reservation means that the proc...
We need for the JVM to reserve a set number of CPUs. Following my research we can use taskset along with the kernel parameter isolcpus= so that no other process uses this CPU. A few questions arise: - does the process need to be started with taskset? - does the reservation means that the process can only run on that CPU and if there are resources problems it can expand to the other CPUs?
danidar (201 rep)
Jul 26, 2018, 03:52 PM • Last activity: Apr 30, 2025, 09:06 PM
1 votes
1 answers
128 views
How to measure actual CPU utilization in Linux for multi core applications?
I have a computation intensive process that I need to run multiple times on a multi-core processor but "top" isn't showing utilization or load in a useful way. For example, imagine my task runs in 1 minute in a single thread on a single core of my six core, 12 thread, SMT CPU. If I start the same ta...
I have a computation intensive process that I need to run multiple times on a multi-core processor but "top" isn't showing utilization or load in a useful way. For example, imagine my task runs in 1 minute in a single thread on a single core of my six core, 12 thread, SMT CPU. If I start the same task six times using six threads, it still finishes in 1 minute and top shows the load average as 6.0 and the cpu(s) at 50% us and 50% id. In the top process list, each of the six processes is showing 100% CPU. If I do the same thing but start 12 threads, it finishes the 12 jobs in 2 minutes and top shows the load average as 12.0, cpu(s) at 100% us 0% id, with 12 processes each at 100% CPU. Now, the 6 thread and 12 thread examples are both processing at the same fully loaded rate of completing 1/6 job per minute but why does top show the 6-thread case being 50% idle when clearly it isn't? Is there a better way of determining the actual load of the CPUs? This was run on a Ryzen 5600X processor on Ubuntu 24.12. Edit: top output for 12 tasks:
top - 08:35:37 up 54 days, 20:49,  3 users,  load average: 12.20, 6.70, 2.80
Tasks: 346 total,  13 running, 332 sleeping,   0 stopped,   1 zombie
%Cpu(s): 98.2 us,  1.7 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.1 si,  0.0 st 
MiB Mem :  64221.7 total,   1572.7 free,   4983.4 used,  58684.1 buff/cache     
MiB Swap:   8192.0 total,   7863.7 free,    328.3 used.  59238.3 avail Mem 

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                
2249765 user    20   0  126952  64132  51200 R 100.0   0.1   3:48.87 sonicLiquidFoam                                        
2249759 user    20   0  127060  64220  51200 R 100.0   0.1   3:48.93 sonicLiquidFoam                                        
2249757 user    20   0  126624  64064  51328 R 100.0   0.1   3:49.32 sonicLiquidFoam                                        
2249761 user    20   0  128276  64868  50688 R 100.0   0.1   3:47.65 sonicLiquidFoam                                        
2249762 user    20   0  127652  63688  50432 R 100.0   0.1   3:49.13 sonicLiquidFoam                                        
2249755 user    20   0  128844  66128  51200 R 100.0   0.1   3:46.06 sonicLiquidFoam                                        
2249766 user    20   0  126576  63952  51328 R 100.0   0.1   3:47.87 sonicLiquidFoam                                        
2249764 user    20   0  126612  63824  51072 R  99.0   0.1   3:48.59 sonicLiquidFoam                                        
2249760 user    20   0  126888  63972  51072 R  98.7   0.1   3:45.06 sonicLiquidFoam                                        
2249758 user    20   0  127500  64860  51200 R  97.7   0.1   3:48.64 sonicLiquidFoam                                        
2249763 user    20   0  127916  64944  51072 R  97.0   0.1   3:39.58 sonicLiquidFoam                                        
2249756 user    20   0  126828  63948  51072 R  96.0   0.1   3:48.77 sonicLiquidFoam
For 6 tasks:
top - 08:40:22 up 54 days, 20:53,  3 users,  load average: 6.11, 6.67, 3.90
Tasks: 335 total,   7 running, 327 sleeping,   0 stopped,   1 zombie
%Cpu(s): 50.0 us,  1.0 sy,  0.0 ni, 49.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st 
MiB Mem :  64221.7 total,   1616.2 free,   4914.6 used,  58710.3 buff/cache     
MiB Swap:   8192.0 total,   7863.7 free,    328.3 used.  59307.1 avail Mem 

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                
2250032 user    20   0  127392  64676  51200 R 100.0   0.1   2:39.15 sonicLiquidFoam                                        
2250027 user    20   0  126828  63096  50176 R 100.0   0.1   2:39.23 sonicLiquidFoam                                        
2250028 user    20   0  127060  63260  50176 R 100.0   0.1   2:39.23 sonicLiquidFoam                                        
2250029 user    20   0  128844  66124  51200 R 100.0   0.1   2:39.12 sonicLiquidFoam                                        
2250030 user    20   0  128276  65508  51200 R 100.0   0.1   2:39.21 sonicLiquidFoam                                        
2250031 user    20   0  126596  63808  51072 R 100.0   0.1   2:39.21 sonicLiquidFoam
tkw954 (113 rep)
Apr 23, 2025, 07:35 PM • Last activity: Apr 24, 2025, 02:43 PM
0 votes
0 answers
33 views
CPU affinity not following cpuset
When I run `taskset -p ` of a process I am getting something like this back: ``` # taskset -p 1078 pid 1078's current affinity mask: 3f ``` And it keeps changing what it reports, sometimes it's 5f, other times df and so on. For the same process I can see that its allowed on all cores: ``` # cat /pro...
When I run taskset -p of a process I am getting something like this back:
# taskset -p 1078
pid 1078's current affinity mask: 3f
And it keeps changing what it reports, sometimes it's 5f, other times df and so on. For the same process I can see that its allowed on all cores:
# cat /proc/1078/status | grep Cpus
Cpus_allowed:   ff
Cpus_allowed_list:      0-7
And its cpuset in cgroups also allows 0-7:
# cat /dev/cpuset/cpus 
0-7
If I try to set it to ff using taskset I still do not get ff back:
# taskset -p ff 1078
pid 1078's current affinity mask: 5f
pid 1078's new affinity mask: 5f
What mechanism is overriding the cpuset and taskset affinity? Any way I can force it to actually run on all cores? This is on Android 13 and kernel 5.15.
Zitrax (284 rep)
Apr 16, 2025, 11:47 AM
2 votes
1 answers
3009 views
High CPU Usage from systemd-udevd
I have a dell studio 1569 and just installed linux onto it. I noticed that the cpu has been running high due to systemd-udevd. Going though different posts on the web including [this one,][1] I used "udevadm monitor" to help narrow down what was happening, and here is the output: [![udevadm monitor...
I have a dell studio 1569 and just installed linux onto it. I noticed that the cpu has been running high due to systemd-udevd. Going though different posts on the web including this one, I used "udevadm monitor" to help narrow down what was happening, and here is the output: udevadm monitor output I first assumed a usb, so I plugged in and unplugged from all ports but soon discovered it did not have the same path as /devices/pci0000:00/0000:00:1d.0/usb2/2-1/2-1.6/2-1.6.2/2-1.6.2:1.0 (usb), then long story short, while I had udevadm monitor running, I pushed some keys on my keyboard and noticed that the path was the same for the keyboard (as seen in the picture above), only difference was the beginning of the line had aKERNEL in front of it instead of KERNEL or UDEV. My next test was while I had udevadm monitor running, I took apart my laptop and disconnected the keyboard to see if those bind/unbind entrys would stop. But they continued, which makes me now think this is not the keyboard. Can someone know of what else it could be if it is not the keyboard? Here is the output from lsusb -t: lsusb -t output EDIT: In case anyone else is running into an issue similar to mine, disabling the bluetooth in the BIOS seems to fix the issue. Refer to this post.
Kayracer (31 rep)
Jun 22, 2019, 02:43 AM • Last activity: Apr 14, 2025, 10:04 PM
0 votes
0 answers
42 views
Kernel APIs to disable/enable CPU cores within a driver module
I have a linux kernel version 6.8.0-57-generic (Ubuntu Jammy). I want try disabling and enabling the cores in the CPU through the linux kernel module (from kernel space) What is the right way to do this ? I understand the [cores can be disabled from userspace][1] . However I intend to try doing the...
I have a linux kernel version 6.8.0-57-generic (Ubuntu Jammy).
I want try disabling and enabling the cores in the CPU through the linux kernel module (from kernel space) What is the right way to do this ? I understand the cores can be disabled from userspace . However I intend to try doing the same from a kernel module. Thanks in Advance
ss22 (13 rep)
Apr 13, 2025, 05:14 PM
1 votes
0 answers
47 views
ARM64 commands take seconds to finish
I'm on an ARM board running Linux. The hardware is a vehicle domain control board which has 6 core ARM Cortex-A78AE and some machine learning cores. I don't want to reboot it because It might be a hardware or driver bug? which might be the reason that causes my performance loss issue. ```none root@h...
I'm on an ARM board running Linux. The hardware is a vehicle domain control board which has 6 core ARM Cortex-A78AE and some machine learning cores. I don't want to reboot it because It might be a hardware or driver bug? which might be the reason that causes my performance loss issue.
root@hobot:~# uname -a
Linux hobot 6.1.94-rt33 #1 SMP PREEMPT_RT Fri Nov  8 15:11:35 CST 2024 aarch64 GNU/Linux
I don't know what happend with my OS today, I just suddenly found out that shell commands takes too long to finish. but everything was fine a little while ago. like: enter image description here As you can see, ls takes about over 5 seconds and it takes 100% CPU in core 4th. I tried strace with ls. It get stuck in nothing. enter image description here What should I monitor to find out what happened?
Xingx1 (11 rep)
Feb 23, 2025, 06:22 AM • Last activity: Feb 23, 2025, 07:45 AM
0 votes
2 answers
258 views
Process called lsof using too much CPU
I keep having ```sh lsof -w -l +d /var/lib/php5 ``` eating up my CPU I want to know who is triggering it and what does it have with php5 ...
I keep having
lsof -w -l +d /var/lib/php5
eating up my CPU I want to know who is triggering it and what does it have with php5 ...
Aleksandar Pavić (109 rep)
Dec 17, 2024, 09:50 AM • Last activity: Jan 16, 2025, 03:21 PM
1 votes
1 answers
214 views
What exactly does %wait in pidstat mean?
Environment Ubuntu22.04 sysstat version 12.2.0 Number of logical CPUs 16 man pidstat shows the following, but I would like to know more specifically about the denominator and numerator of %wait. > Percentage of CPU spent by the task while waiting to run. One time I was looking at the performance of...
Environment Ubuntu22.04 sysstat version 12.2.0 Number of logical CPUs 16 man pidstat shows the following, but I would like to know more specifically about the denominator and numerator of %wait. > Percentage of CPU spent by the task while waiting to run. One time I was looking at the performance of each process in pidstat and found that there were times when the %wait of one process was quite high, such as 90%. At the same time, %usr was also high and sometimes exceeded 100%. At that time, the total CPU utilization was close to 100%, and the total %CPU of all processes was close to 1600%, which is the number of CPU cores. However, the total %wait of all processes was more than 1000%. Since CPU resources are used only for %CPU, I assume that %wait is the time when CPU is not used, but what is this percentage? In mpstat, for example, the total of %usr, %sys, %iowait, %idle, etc. would be about 100% of CPU resources. What exactly is the %wait in the case of pidstat? If the process is simply not running, I think %wait would also be 0. It is also curious that during the times when %wait was high, disk i/o and net i/o were not high when examined by dstat and other means. From reading the systat code, it seems that %wait, like %usr and %sys, uses as its numerator the cumulative increase in the amount of time that corresponds to wait at a given point in time, and as its denominator the period of time compared to that point in time. I am not sure what kind of time can be considered as %wait.
LAPK (11 rep)
Jan 15, 2025, 11:13 PM • Last activity: Jan 16, 2025, 12:15 AM
0 votes
0 answers
29 views
Linux kernel cgroup v2 CFS - cpu throttled_usec accounting?
In Linux kernel cgroup v2’s CFS scheduler, how is cpu.stat `throttled_usec` accounted when a cgroup with multiple threads gets throttled during a single quota period? Specifically, is `throttled_usec` tracked as the total wall-clock time that the cgroup was throttled as a whole, or is it a sum of th...
In Linux kernel cgroup v2’s CFS scheduler, how is cpu.stat throttled_usec accounted when a cgroup with multiple threads gets throttled during a single quota period? Specifically, is throttled_usec tracked as the total wall-clock time that the cgroup was throttled as a whole, or is it a sum of the throttled times of all individual threads? Kernel Version: "5.14.0-284.11.1.el9_2.x86_64 #1 SMP PREEMPT_DYNAMIC Tue May 9 11:41:53 PDT 2023 x86_64 x86_64 x86_64 GNU/Linux" Distro: Oracle Linux 9.x
ALZ (961 rep)
Jan 11, 2025, 12:35 PM
13 votes
5 answers
57255 views
setroubleshootd excessive cpu and memory usage
I have Centos 7 fresh install and I see setroubleshootd with high CPU usage. How can I fix this? What is this process doing?
I have Centos 7 fresh install and I see setroubleshootd with high CPU usage. How can I fix this? What is this process doing?
stiv (1691 rep)
Mar 29, 2019, 08:41 PM • Last activity: Dec 9, 2024, 11:31 PM
0 votes
0 answers
33 views
What is the meaning of columns in the table displayed by cpupower-monitor?
Running `sudo cpupower monitor` on Ubuntu 24.04 I'm getting: ``` | Nehalem || Mperf || RAPL || Idle_Stats CPU| C3 | C6 | PC3 | PC6 || C0 | Cx | Freq || pack | core | unco || POLL | C1_A | C2_A | C3_A 0| 0.00| 0.46| 0.00| 0.00|| 11.71| 88.29| 3060||37496059|27532217| 73852|| 0.01| 8.94| 79.93| 0.00 1...
Running sudo cpupower monitor on Ubuntu 24.04 I'm getting:
| Nehalem                   || Mperf              || RAPL               || Idle_Stats
 CPU| C3   | C6   | PC3  | PC6   || C0   | Cx   | Freq  || pack | core | unco  || POLL | C1_A | C2_A | C3_A
   0|  0.00|  0.46|  0.00|  0.00|| 11.71| 88.29|  3060||37496059|27532217| 73852||  0.01|  8.94| 79.93|  0.00
   1|  0.00|  0.46|  0.00|  0.00||  6.56| 93.44|  2790||37496059|27532217| 73852||  0.00|  2.56| 53.19| 37.97
   2|  0.00|  4.67|  0.00|  0.00||  9.14| 90.86|  3194||37496059|27532217| 73852||  0.01|  4.43| 58.09| 28.77
   3|  0.00|  4.67|  0.00|  0.00||  6.24| 93.76|  3239||37496059|27532217| 73852||  0.00|  1.69| 34.62| 57.66
   4|  0.00|  0.00|  0.00|  0.00||  0.29| 99.71|  4461||37496059|27532217| 73852||  0.00|  0.00|  0.00| 99.97
   5|  0.00|  0.00|  0.00|  0.00|| 29.52| 70.48|  4457||37496059|27532217| 73852||  0.00| 64.48|  0.00|  6.48
   6|  0.00|  0.00|  0.00|  0.00|| 47.92| 52.08|  4123||37496059|27532217| 73852||  0.00|  0.95|  0.31| 51.40
   7|  0.00|  0.00|  0.00|  0.00||  0.00|100.00|  3753||37496059|27532217| 73852||  0.00|  0.00|  0.00| 99.99
   8|  0.00| 25.03|  0.00|  0.00||  3.27| 96.73|  2807||37496059|27532217| 73852||  0.01|  4.13| 54.72| 38.03
   9|  0.00| 62.12|  0.00|  0.00||  2.32| 97.68|  2897||37496059|27532217| 73852||  0.00|  1.29| 30.79| 65.69
  10|  0.00| 77.85|  0.00|  0.00||  2.19| 97.81|  3064||37496059|27532217| 73852||  0.00|  0.94| 18.42| 78.51
  11|  0.00|  0.00|  0.00|  0.00|| 14.84| 85.16|  2497||37496059|27532217| 73852||  0.01| 57.88| 27.45|  0.38
  12|  0.00| 70.55|  0.00|  0.00||  3.27| 96.73|  2399||37496059|27532217| 73852||  0.00|  1.33| 30.14| 65.41
  13|  0.00| 54.45|  0.00|  0.00||  4.02| 95.98|  2213||37496059|27532217| 73852||  0.00|  1.06| 43.49| 51.62
  14|  0.00| 67.90|  0.00|  0.00||  3.36| 96.64|  2334||37496059|27532217| 73852||  0.00|  1.34| 30.91| 64.54
  15|  0.00| 72.39|  0.00|  0.00||  2.41| 97.59|  2167||37496059|27532217| 73852||  0.00|  1.25| 26.74| 69.73
What is the meaning of columns in this table? cpupower manual (https://linux.die.net/man/1/cpupower-monitor) doesn't have that information. I'm assuming that C3, C6, etc. are percentages of time spent in a given CPU C-state. Also, I'm running a program with two threads pinned to CPU #5 and #6. Cores #2 and #3 containing hyperthreaded CPUs 4-7 are isolated with GRUB_CMDLINE_LINUX="nohz=on nohz_full=4-7 rcu_nocbs=4-7 isolcpus=4-7 irqaffinity=0-3,8-15". The table above confirms load on CPU #5 and #6, but also consistently shows some marginal load on CPU #4. It doesn't correspond to the screen of Ubuntu System Monitor, which consistently shows 0% load on CPU #4 listed as CPU5 here: enter image description here Additionally, why logical CPUs of the same physical core show different frequencies?
Paul Jurczak (151 rep)
Nov 23, 2024, 03:38 AM • Last activity: Nov 23, 2024, 06:46 AM
1 votes
1 answers
158 views
100% CPU on 4 of 8 cores on Oracle Linux
I have a desktop computer (Intel i4770) running Oracle Linux 7.9 with kernel 4.1.12-61. I usually keep it off and only turn it on on the rare occasions when I need to test something. A month or so ago, I turned it on and noticed that the fans were on max speed - I checked top and found that setroubl...
I have a desktop computer (Intel i4770) running Oracle Linux 7.9 with kernel 4.1.12-61. I usually keep it off and only turn it on on the rare occasions when I need to test something. A month or so ago, I turned it on and noticed that the fans were on max speed - I checked top and found that setroubleshoot was at 100% so I killed the process. The process kept coming back and I kept killing it but ultimately, it didn't matter much because my testing was done and I turned the computer off again. (Yes, I always shut it down the right way.) Now trying to get to the root of the problem, setroubleshoot is no longer showing 100% in top. In fact, nothing is even close to 100%. Running htop, I can get details about the CPUs and 4 of the 8 cores are permanently 100%. From the time the computer lets me log in to when I shut it down. But there's nothing in the list of processes even above 5.2%. htop showing 4 * 100% with no processes that would contribute to that When I run perf on each core with perf top -C 1 --sort comm, I can see that cores zero through 3 are all 100% kernel. perf for single CPU Here is the perf report from running perf record -a -F 999 -- sleep 10. I don't know if the failure to find useful symbols is indicative of the problem I'm chasing, if it is a different issue that I'll need help figure out, or if it is something that should be ignored. perf-report On the desktop of this computer, I noticed a bunch of SELinux errors. They all appear to be saying that there was an attempt to execute something that should not have been allowed. SELinux errors And just to confirm that htop was right about what it was reporting, here's the report from the System Monitor. System Monitor display showing 8 cores with 4 at 100% Booting into a prior version of the kernel didn't help. And booting into the "rescue" kernel didn't help either. I tried updating the kernel but that didn't help. I ran a software update and that didn't help either. Note that I hadn't done any updates or installed any new software immediately prior to this problem starting. This install had been stable for years when I needed it. I also tried installing the same OS over again on a new external drive. That worked. No issues on that drive. But when I boot to that drive and then choose the kernel that is on the main drive, the problem returns. That all seems to prove that the kernel isn't the issue but the system main drive has something wrong. I'm at a loss for how to debug further. I can't figure out what changed and why so I don't know how to even start fixing it. Any help about where to look and what to check would be appreciated! ___ Edit 1: ___ Output from ps -efl|sort -rk14|head: F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME CMD 0 S gdm 2085 2041 2 80 0 - 905731 - 17:42 ? 00:00:02 /usr/bin/gnome-shell 4 S root 1 0 1 80 0 - 54811 - 17:41 ? 00:00:01 /usr/lib/systemd/systemd --switched-root --system --deserialize 22 4 S root 837 1 1 80 0 - 22671 - 17:42 ? 00:00:01 /sbin/rngd -f 1 S root 405 2 0 60 -20 - 0 - 17:41 ? 00:00:00 [xfs_mru_cache] 1 S root 407 2 0 60 -20 - 0 - 17:41 ? 00:00:00 [xfs-data/sda1] 1 S root 408 2 0 60 -20 - 0 - 17:41 ? 00:00:00 [xfs-conv/sda1] 1 S root 409 2 0 60 -20 - 0 - 17:41 ? 00:00:00 [xfs-cil/sda1] 1 S root 406 2 0 60 -20 - 0 - 17:41 ? 00:00:00 [xfs-buf/sda1] 1 S root 404 2 0 60 -20 - 0 - 17:41 ? 00:00:00 [xfsalloc] output from dmesg | grep libsystem [ 3.344918] audit: type=1400 audit(1731105719.768:4): avc: denied { execute } for pid=496 comm="systemd-journal" path="/usr/local/lib/libsystem.so" dev="sda1" ino=136977390 scontext=system_u:system_r:syslogd_t:s0 tcontext=unconfined_u:object_r:user_tmp_t:s0 tclass=file permissive=0 [ 3.351928] audit: type=1400 audit(1731105719.775:6): avc: denied { execute } for pid=502 comm="systemd-readahe" path="/usr/local/lib/libsystem.so" dev="sda1" ino=136977390 scontext=system_u:system_r:readahead_t:s0 tcontext=unconfined_u:object_r:user_tmp_t:s0 tclass=file permissive=0 [ 3.351929] audit: type=1400 audit(1731105719.775:5): avc: denied { execute } for pid=503 comm="systemd-readahe" path="/usr/local/lib/libsystem.so" dev="sda1" ino=136977390 scontext=system_u:system_r:readahead_t:s0 tcontext=unconfined_u:object_r:user_tmp_t:s0 tclass=file permissive=0 [ 3.374010] audit: type=1400 audit(1731105719.797:7): avc: denied { execute } for pid=513 comm="systemd-tmpfile" path="/usr/local/lib/libsystem.so" dev="sda1" ino=136977390 scontext=system_u:system_r:systemd_tmpfiles_t:s0 tcontext=unconfined_u:object_r:user_tmp_t:s0 tclass=file permissive=0 [ 3.386383] audit: type=1400 audit(1731105719.810:8): avc: denied { execute } for pid=525 comm="systemd-sysctl" path="/usr/local/lib/libsystem.so" dev="sda1" ino=136977390 scontext=system_u:system_r:systemd_sysctl_t:s0 tcontext=unconfined_u:object_r:user_tmp_t:s0 tclass=file permissive=0 [ 3.397457] audit: type=1400 audit(1731105719.821:9): avc: denied { execute } for pid=536 comm="hostname" path="/usr/local/lib/libsystem.so" dev="sda1" ino=136977390 scontext=system_u:system_r:hostname_t:s0 tcontext=unconfined_u:object_r:user_tmp_t:s0 tclass=file permissive=0 [ 3.673352] audit: type=1400 audit(1731105720.097:10): avc: denied { execute } for pid=657 comm="alsactl" path="/usr/local/lib/libsystem.so" dev="sda1" ino=136977390 scontext=system_u:system_r:alsa_t:s0-s0:c0.c1023 tcontext=unconfined_u:object_r:user_tmp_t:s0 tclass=file permissive=0 ___ Edit 2: ___ I installed kernel debug info and downgraded perf to version 3 since apparently there is a bug with the perf version for OL7.9 . However there are still symbols that can't be found. perf-report2 The number in the list above, 1399, is the PID for the process but that process isn't visible either through htop or ps. As soon as I did a kill -9 1399, the CPU usage immediately dropped to zero. That's nice because at least the problem process is now dead. And I know how to kill it, even though I don't see it in the normal process lists. But the fundamental question remains - where is this process coming from and how do I stop it from starting in the first place!?
ktbos (111 rep)
Nov 8, 2024, 07:24 PM • Last activity: Nov 11, 2024, 05:30 PM
0 votes
1 answers
118 views
Why is the Linux CPU stalling when having multithreaded memory writes?
HW Specs: - CPU: 64 Cores, 128 Threads, AMD Ryzen Threadripper Pro 5995WX - RAM: 512 GB, Manufacturer unknown, will try to provide if needed Linux Specs: - OS: Ubuntu 22.04.4 LTS - Linux Kernel: 5.15.0-119-generic I'm trying to get model training to work with pytorch on a Linux server, where I have...
HW Specs: - CPU: 64 Cores, 128 Threads, AMD Ryzen Threadripper Pro 5995WX - RAM: 512 GB, Manufacturer unknown, will try to provide if needed Linux Specs: - OS: Ubuntu 22.04.4 LTS - Linux Kernel: 5.15.0-119-generic I'm trying to get model training to work with pytorch on a Linux server, where I have observed performance degradation of a factor ~10 after letting a resource intensive training task run for a couple of minutes (Training on 4 GPUs having a multithreaded Dataloader each). Trying to isolate the root cause for this issue, I have now come up with a minimal test in python reproducing the issue, by continuously writing 1GB of data to RAM. Running this with 32 Threads in parallel (CPU has 128 Threads available) the CPU stalls after 0%) until giving it some cooldown time of approx 1min. I have run the test on another server (48 CPU threads, 160GB RAM) for 10 minutes without any problems (On this server multi-GPU training is also running without any performance degradation). Opposed to the self-implemented memory write test, I have also tried a benchmark test using sysbench, writing 10TB of data with up to 96 Threads without any problem. This is where I don't really understand the difference, whether this task writes the data only in some sort of buffer without really allocating any RAM memory? I ran the test with the follwing command: sysbench --threads=96 --time=0 --memory-block-size=128K --memory-total-size=10T --report-interval=1 --memory-oper=write memory run The main observable difference of sysbench to my python test script was in htop, where sysbench had all threads running as normal priority/user threads (green bars) while my python script caused a large portion being kernel time (red bars), in my understanding caused by a lot of wait time required. My question now is, does this diagnostic give some indication on why the system is stalling? Might there be a hardware issue with RAM or could this be an issue with the OS? Or what further tests could I do to isolate the root cause? --- Edit: In the following you can find the minimal python script:
import time
import numpy as np
import threading

data = np.zeros((1024, 1024, 1024, 1), dtype=np.uint8)

def allocate_memory():
    while True:
        start_time = time.time()
        _ = data * 0
        end_time = time.time()
        print(f"Time: {end_time - start_time:.3f} s")
    
    print(data.shape)

def run_in_threads(num_threads):
    threads = []
    for _ in range(num_threads):
        thread = threading.Thread(target=allocate_memory)
        thread.start()
        threads.append(thread)
    
    for thread in threads:
        thread.join()

if __name__ == "__main__":
    num_threads = 32
    run_in_threads(num_threads)
m4fr1699 (1 rep)
Nov 5, 2024, 08:27 AM • Last activity: Nov 7, 2024, 08:58 AM
1 votes
1 answers
329 views
Btop - What is the LAV value mean?
I have a KVM guest and inside it I am running vnc server. I have a ssh tunnel that I connect to it using tigervnc viewer. Everything works good for it except one issue. Using Chrome (using X server), when I vertically scroll on a website it is a bit laggy. I checked the cpu and it looks good, I neve...
I have a KVM guest and inside it I am running vnc server. I have a ssh tunnel that I connect to it using tigervnc viewer. Everything works good for it except one issue. Using Chrome (using X server), when I vertically scroll on a website it is a bit laggy. I checked the cpu and it looks good, I never see it at 100%... at most 50%. Memory is good to... I usually have about 30 gigs free of ram. However, I see the first LAV value is 1.88 on btop. What does that mean exactly? Does that mean 100 percent of the cpu is being used and 88 percent of processes are waiting? enter image description here
dman (569 rep)
Oct 14, 2024, 05:04 PM • Last activity: Oct 14, 2024, 05:18 PM
-1 votes
1 answers
37 views
(Solaris) Ram CPU monitoring script is grabbing incorrect cpu ram utilization values
```#!/bin/bash host=$(hostname) email="abc.@xyz.com" # Change to your desired email subject="Attention!!! Health check Failed on $host" echo $(date) # CPU use threshold cpu_threshold=0 # Memory idle threshold mem_threshold=0 #--- CPU cpu_usage () { # Get CPU idle percentage cpu_idle=$(prstat -Z 1 1...
#!/bin/bash

host=$(hostname)
email="abc.@xyz.com"  # Change to your desired email
subject="Attention!!! Health check Failed on $host"
echo $(date)

# CPU use threshold
cpu_threshold=0

# Memory idle threshold
mem_threshold=0

#--- CPU
cpu_usage () {
    # Get CPU idle percentage
    cpu_idle=$(prstat -Z 1 1 | awk 'NR==2 {print $8}' | tr -d '%')
    
    # Check if cpu_idle is a valid number
    if ! [[ "$cpu_idle" =~ ^[0-9]+$ ]]; then
        echo "Error: Invalid CPU idle value: $cpu_idle"
        cpu_use=0
    else
        cpu_use=$((100 - cpu_idle))
    fi

    cpu_flag=0
    echo "CPU utilization: $cpu_use%"
    if [ "$cpu_use" -gt "$cpu_threshold" ]; then
        echo "CPU warning!!!"
        cpu_flag=1
    else
        echo "CPU ok!!!"
    fi
}

#--- Memory
mem_usage () {
    mem_total=$(kstat -m | grep "physmem" | awk '{print $2}')  # Total memory in bytes
    mem_free=$(kstat -m | grep "freemem" | awk '{print $2}')   # Free memory in bytes

    if [[ -z "$mem_total" || "$mem_total" -eq 0 ]]; then
        echo "Failed to retrieve total memory."
        mem_total=0
        mem_free=0
    fi

    # Convert bytes to GB
    mem_total_gb=$((mem_total / 1024 / 1024))
    mem_free_gb=$((mem_free / 1024 / 1024))

    echo "Total memory: $mem_total_gb GB"
    echo "Free memory: $mem_free_gb GB"

    if [ "$mem_total" -gt 0 ]; then
        per_mem=$(( ((mem_total - mem_free)) * 100 / mem_total ))
    else
        per_mem=0
    fi

    echo "Memory space remaining: $mem_free_gb GB"
    mem_flag=0
    if [ "$per_mem" -gt "$mem_threshold" ]; then
        echo "Memory warning!!!"
        mem_flag=1
    else
        echo "Memory ok!!!"
    fi
}

out() {
    if [ "$cpu_flag" -eq 1 ] && [ "$mem_flag" -eq 1 ]; then 
        printf "
Hello Team,

Please check RAM and CPU utilization on $host.
Current RAM Percentage: $per_mem%%
Current CPU Percentage: $cpu_use%%

Thanks " > /tmp/health.txt
    elif [ "$cpu_flag" -eq 1 ]; then 
        printf "
Hello Team,

Please check CPU utilization on $host.
Current CPU Percentage: $cpu_use%%

Thanks " > /tmp/health.txt
    elif [ "$mem_flag" -eq 1 ]; then 
        printf "
Hello Team,

Please check RAM utilization on $host.
Current RAM Percentage: $per_mem%%

Thanks " > /tmp/health.txt
    fi
}

mail() {
    if [ "$cpu_flag" -eq 1 ] || [ "$mem_flag" -eq 1 ]; then
        /usr/sbin/sendmail -t <
It gives the following output. Despite using kstat, it throws this error. What am I doing wrong here?
Fri Oct 11 06:51:21 CDT 2024
Error: Invalid CPU idle value: 17:25:22
CPU utilization: 0%
CPU ok!!!
Usage:
kstat [ -qlp ] [ -T d|u ] [ -c class ]
      [ -m module ] [ -i instance ] [ -n name ] [ -s statistic ]
      [ interval [ count ] ]
kstat [ -qlp ] [ -T d|u ] [ -c class ]
      [ module:instance:name:statistic ... ]
      [ interval [ count ] ]
Usage:
kstat [ -qlp ] [ -T d|u ] [ -c class ]
      [ -m module ] [ -i instance ] [ -n name ] [ -s statistic ]
      [ interval [ count ] ]
kstat [ -qlp ] [ -T d|u ] [ -c class ]
      [ module:instance:name:statistic ... ]
      [ interval [ count ] ]
Failed to retrieve total memory.
Total memory: 0 GB
Free memory: 0 GB
Memory space remaining: 0 GB
Memory ok!!!
The CPU utilization is getting grabbed correctly now, only the memory usage is not reported correctly. I am not sure which module to use with kstat command, so I used prtconf as well but that didn't capture it either.
Navdeep Singh (37 rep)
Oct 11, 2024, 11:59 AM • Last activity: Oct 11, 2024, 01:02 PM
Showing page 1 of 20 total questions