Unix & Linux Stack Exchange
Q&A for users of Linux, FreeBSD and other Unix-like operating systems
Latest Questions
4
votes
3
answers
3079
views
Check what process is spiking the average load with atop
In trying to find the culprit of a high load average on a system during night (which does not seem to be related to logrotate) I installed atop to write a raw file with a specific interval. While reading the file, it seems the processlist stands still, can I somehow go back and forth between the sam...
In trying to find the culprit of a high load average on a system during night (which does not seem to be related to logrotate) I installed atop to write a raw file with a specific interval. While reading the file, it seems the processlist stands still, can I somehow go back and forth between the samples to see what sticks out, and further sort by any column (like cpu usage)?
user135361
(193 rep)
Sep 3, 2013, 01:31 PM
• Last activity: Jun 4, 2025, 01:06 AM
2
votes
2
answers
5249
views
atop peak memory usage from log
I have `atop` logs stored on a dayly basis with 10 min interval and I can read them with `atop -r `, but how can I find a peak memory usage in this log?
I have
atop
logs stored on a dayly basis with 10 min interval and I can read them with atop -r
, but how can I find a peak memory usage in this log?
mrgloom
(123 rep)
May 11, 2017, 08:41 AM
• Last activity: Sep 5, 2024, 01:03 PM
0
votes
1
answers
185
views
atop is not logging daily
I installed `atop` on `Rocky 9` to monitor CPU spike issues. On 2/8/2024 (2 days ago) I enabled and started `atop`: systemctl enable atop systemctl start atop Thus, a file was created in `/var/log/atop` named `atop_20240802`. `journalctl -u atop` output: Aug 02 23:52:01 systemd[1]: Starting Atop adv...
I installed
atop
on Rocky 9
to monitor CPU spike issues. On 2/8/2024 (2 days ago) I enabled and started atop
:
systemctl enable atop
systemctl start atop
Thus, a file was created in /var/log/atop
named atop_20240802
.
journalctl -u atop
output:
Aug 02 23:52:01 systemd: Starting Atop advanced performance monitor...
Aug 02 23:52:01 systemd: Started Atop advanced performance monitor.
I updated the config file /usr/share/atop/atop.daily
:
#!/usr/bin/sh
LOGOPTS="" # default options
LOGINTERVAL=3600 # default interval in seconds
LOGGENERATIONS=7 # default number of days
LOGPATH=/var/log/atop # default log location
# allow administrator to overrule the variables
# defined above
#
DEFAULTSFILE=/etc/sysconfig/atop # possibility to overrule vars
ls -ld /var/log/atop/
output:
drwxr-xr-x 2 root root 53 Aug 2 23:53 /var/log/atop/
Daily logs were not created for yesterday and today. How to fix it?
user25934652
(1 rep)
Aug 4, 2024, 03:49 AM
• Last activity: Aug 20, 2024, 11:54 AM
2
votes
1
answers
480
views
What are "shrss" / "shswp" in atop?
From [`atop` manual][1] ([official][2]): > **MEM** - Memory occupation. > > This line contains the total amount of physical memory (`tot`), [...] the resident size of shared memory including tmpfs (`shmem`), the resident size of shared memory (`shrss`) the amount of shared memory that is currently s...
From
atop
manual (official ):
> **MEM** - Memory occupation.
>
> This line contains the total amount of physical memory (tot
), [...] the resident size of shared memory including tmpfs
(shmem
), the resident size of shared memory (shrss
) the amount of shared memory that is currently swapped (shswp
) [...]
The description of shrss
/ shswp
is too vague for me. What does it actually cover?
MEM | tot 7.7G | free 3.6G | cache 2.1G | dirty 0.3M | buff 258.1M | slab 306.4M | slrec 182.7M | shmem 257.5M | shrss 12.7M | shswp 0.0M | vmbal 0.0M | | hptot 0.0M | hpuse 0.0M |
# Or
tot 7.7G
free 3.6G
cache 2.1G
dirty 0.3M
buff 258.1M
slab 306.4M
slrec 182.7M
shmem 257.5M
shrss 12.7M
shswp 0.0M
vmbal 0.0M
hptot 0.0M
hpuse 0.0M |
---
$ rpm -q atop
atop-2.4.0-1.fc30.x86_64
sourcejedi
(53232 rep)
Jun 29, 2019, 03:36 PM
• Last activity: Aug 15, 2024, 09:59 PM
6
votes
5
answers
25305
views
How to configure atop sampling interval & log path?
I installed `atop` version 1.26 on Ubuntu 12.04 server. By default, its sampling interval is 10 minutes. How can I change to 1 minute? Also, how can I set the log path? Currently the logs are at `/var/log/`, but I notice that `atop` added a folder `/var/log/atop/` (empty). How can I set the logs to...
I installed
atop
version 1.26 on Ubuntu 12.04 server. By default, its sampling interval is 10 minutes. How can I change to 1 minute?
Also, how can I set the log path? Currently the logs are at /var/log/
, but I notice that atop
added a folder /var/log/atop/
(empty). How can I set the logs to be saved in this folder?
Raptor
(315 rep)
Apr 13, 2016, 03:10 AM
• Last activity: Aug 15, 2024, 07:26 PM
9
votes
2
answers
5675
views
Can I extract the full command line from an atop 1.23 data file?
My production servers have [`atop`][1] version 1.23, and I have a data file written by it from which I need to extract the full command line of a process. Unfortunately: * this version of `atop` won't display data beyond the 80th column of the display * horizontal scrolling was only introduced in 1....
My production servers have
atop
version 1.23, and I have a data file written by it from which I need to extract the full command line of a process.
Unfortunately:
* this version of atop
won't display data beyond the 80th column of the display
* horizontal scrolling was only introduced in 1.27
* newer versions refuse to read data files written by older versions
* the file itself is compressed, so a simple strings
search won't work
Is there any way I can recover the full command line from my data file?
Flup
(8335 rep)
May 14, 2013, 10:54 AM
• Last activity: May 13, 2024, 12:12 PM
0
votes
1
answers
96
views
Persistent atop across reboots
[`atop` shows total CPU time since the last reboot](https://superuser.com/q/1816687). How can I have it persist across reboots?
[
atop
shows total CPU time since the last reboot](https://superuser.com/q/1816687) . How can I have it persist across reboots?
wdkmaaeo
(1 rep)
Nov 16, 2023, 07:04 PM
• Last activity: Nov 16, 2023, 08:26 PM
2
votes
2
answers
12607
views
how auto-rotate atop logs
we can see that atop logs are created on each day and that take a lot of space ls -l /var/log/atop/ total 1634632 -rw-r--r-- 1 root root 127992086 Aug 30 01:49 atop_20180829 -rw-r--r-- 1 root root 262277153 Aug 31 00:00 atop_20180830 -rw-r--r-- 1 root root 321592670 Sep 1 00:00 atop_20180831 -rw-r--...
we can see that atop logs are created on each day
and that take a lot of space
ls -l /var/log/atop/
total 1634632
-rw-r--r-- 1 root root 127992086 Aug 30 01:49 atop_20180829
-rw-r--r-- 1 root root 262277153 Aug 31 00:00 atop_20180830
-rw-r--r-- 1 root root 321592670 Sep 1 00:00 atop_20180831
-rw-r--r-- 1 root root 330041977 Sep 2 00:00 atop_20180901
-rw-r--r-- 1 root root 269040388 Sep 3 00:00 atop_20180902
-rw-r--r-- 1 root root 274807097 Sep 4 00:00 atop_20180903
-rw-r--r-- 1 root root 85426960 Sep 4 06:03 atop_20180904
-rw-r--r-- 1 root root 0 Sep 4 06:03 daily.log
how to limit the atop log for example to only 5 logs ( 5 last days )
yael
(13936 rep)
Sep 4, 2018, 06:06 AM
• Last activity: Feb 14, 2022, 09:51 AM
8
votes
9
answers
17568
views
Change atop log interval from 10 minutes to 1 minute
I'm having post-mortem `atop` log on Ubuntu machine, which I'm able to view with atop -r The problem is that log files are shown in 10 minute intervals with `t` and `Shift+t`, while I would prefer 1 minute intervals. When current time is changed with `b`, it is quantized to nearest 10 minute point....
I'm having post-mortem
atop
log on Ubuntu machine, which I'm able to view with
atop -r
The problem is that log files are shown in 10 minute intervals with t
and Shift+t
, while I would prefer 1 minute intervals. When current time is changed with b
, it is quantized to nearest 10 minute point.
I have a line in /etc/init.d/atop
, but I'm not sure if it affects anything here:
INTERVAL=60
How can I make atop
logs to be browsed with 1 minute accuracy with t
and Shift+t
?
Is the information between these 10 minute intervals already lost?
What are the possible workarounds if this is not possible?
Estus Flask
(214 rep)
Feb 23, 2017, 07:41 PM
• Last activity: Oct 6, 2021, 05:07 PM
1
votes
0
answers
193
views
When using atop as a systemd service it only runs and logs to file once
I am on **CentOS 7** and using **atop v2.6.0** If I run atop manually with the following command everything runs as expected and the log is written to every 10 seconds: `/usr/bin/atop -w /var/log/atop/atop_20210320 10` However when running atop as a service with systemd the log is only written to on...
I am on **CentOS 7** and using **atop v2.6.0**
If I run atop manually with the following command everything runs as expected and the log is written to every 10 seconds:
/usr/bin/atop -w /var/log/atop/atop_20210320 10
However when running atop as a service with systemd the log is only written to once when the service starts and never updated.
I can confirm via ps -ef
that the service has initiated the command and it's running:
root 2320 1 0 13:34 ? 00:00:00 /usr/bin/atop -w /var/log/atop/atop_20210320 10
Any help would be much appreciated. Thank you.
Pel
(11 rep)
Mar 20, 2021, 06:56 PM
3
votes
1
answers
3396
views
atop shows `swout` (swapping) when I have gigabytes of free memory. Why?
Why did `atop` show that I was swapping out over 20,000 pages - over 80 megabytes - when I had gigabytes of free memory? I have not noticed a performance problem with this. I simply wish to take the opportunity to increase my knowledge :-). `atop` refreshes every ten seconds. Each refresh shows the...
Why did
atop
show that I was swapping out over 20,000 pages - over 80 megabytes - when I had gigabytes of free memory?
I have not noticed a performance problem with this. I simply wish to take the opportunity to increase my knowledge :-).
atop
refreshes every ten seconds. Each refresh shows the activity since the last refresh.
MEM | tot 7.7G | free 3.7G | cache 608.3M | buff 19.1M | slab 264.6M |
SWP | tot 2.0G | free 1.4G | | vmcom 13.4G | vmlim 5.8G |
PAG | scan 167264 | steal 109112 | stall 0 | swin 0 | swout 23834 |
"swout" is non-zero and coloured in red ^
Kernel meminfo:
$ head -n 5 /proc/meminfo
MemTotal: 8042664 kB
MemFree: 3563088 kB
MemAvailable: 3922092 kB
Buffers: 20484 kB
Cached: 775308 kB
Kernel version:
$ uname -r
5.0.16-200.fc29.x86_64
---
1. It is not clear that this would be affected by vm.swappiness
. That setting balances cache reclaim v.s. swapping. However there is plenty of *free* memory, so why would we need to reclaim any memory in the first place?
2. As you can see, this is a small system. It does not use NUMA. I checked in /proc/zoneinfo
and I only have one node, "Node 0". So this is not [caused by NUMA](https://serverfault.com/a/510220/133475) .
3. Related questions and answers mention the idea of "opportunistic swapping", "when the system has nothing better to do", "which may provide a benefit if there's a memory shortage later", etc. I do not find these ideas credible, because they contradict the kernel documentation. See https://unix.stackexchange.com/questions/533739/does-linux-perform-opportunistic-swapping-or-is-it-a-myth/533741#533741
4. There are no limits set on RAM usage using systemd.resources
features. I.e. I think all systemd
units have their RAM usage limit set to "infinity".
$ systemctl show '*' | \
grep -E '(Memory|Swap).*(Max|Limit|High)' | \
grep -v infinity
$
5. Edit: I suspect this is related to transparent huge pages. I note that Virtual Machines use transparent huge pages to allocate the guest memory efficiently. They are the only user program that uses huge pages [on my system](https://unix.stackexchange.com/questions/495816/which-distributions-enable-transparent-huge-pages-for-all-applications) .
There is a similar-looking question: https://unix.stackexchange.com/questions/530314/can-kswapd-be-active-if-free-memory-well-exceeds-pages-high-watermark It is asking about RHEL 6, which enables huge pages for all applications.
I am not sure exactly how to reproduce this result.
It happened when starting a VM. I use libvirt to run VMs. As per the default, VM disk reads are cached using the host page cache. (Cache mode: "Hypervisor default" means "Writeback").
I tried to stop the VM, FADVISE_DONTNEED
the image file, and try again. But the same thing did not happen.
Then I tried again with a different VM, and it happened briefly. I captured vmstat
. I think atop
showed a different, higher figure for "swout", but I did not capture it.
$ vmstat 10
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 0 770168 5034300 28024 936256 0 2 21 86 60 101 22 9 68 1 0
0 0 770168 5033852 28048 935904 0 0 0 8 372 633 1 1 97 0 0
1 0 770168 4974048 28252 948152 3 0 1137 194 1859 2559 11 7 79 3 0
0 1 770168 4557968 28308 1037512 0 0 8974 45 3595 6218 16 6 57 21 0
6 3 770168 4317800 28324 1111408 0 0 7200 609 6820 6793 12 5 38 44 0
0 4 770168 4125100 28348 1182624 0 0 6900 269 5054 3674 74 3 8 15 0
0 5 770168 3898200 27840 1259768 2 0 9421 630 4694 5465 11 6 11 71 0
1 3 770168 3760316 27672 1300540 0 0 9294 897 3308 4135 5 4 28 63 0
0 1 770168 3531332 27032 1356236 0 0 10532 155 3140 4949 8 5 63 25 0
0 0 783772 3381556 27032 1320296 0 1390 7320 4210 4450 5112 17 5 43 35 0
0 0 783772 3446284 27056 1335116 0 0 239 441 995 1782 4 2 92 2 0
0 0 783772 3459688 27076 1335372 0 0 3 410 728 1037 2 2 95 1 0
I also checked for a cgroup memory limit on the VM, on the off-chance that libvirt had bypassed systemd
, and inflicted swapping on itself by mistake:
$ cd /sys/fs/cgroup/memory/machine.slice/machine-qemu\x2d5\x2ddebian9.scope
$ find -type d # there were no sub-directories here
$ grep -H . *limit_in_bytes
memory.kmem.limit_in_bytes:9223372036854771712
memory.kmem.tcp.limit_in_bytes:9223372036854771712
memory.limit_in_bytes:9223372036854771712
memory.memsw.limit_in_bytes:9223372036854771712
memory.soft_limit_in_bytes:9223372036854771712
$ cd ../..
$ find -name "*limit_in_bytes" -exec grep -H -v 9223372036854771712 \{\} \;
$
sourcejedi
(53232 rep)
May 26, 2019, 02:36 PM
• Last activity: Feb 15, 2021, 09:53 PM
3
votes
2
answers
6094
views
How to completely stop atop from creating logs for good?
I've installed `atop` on a orangePi/RaspberryPi with only 40MB for `/var/log`. It quickly filled my partition to 100%, I then purged it `apt purge atop` to remove it and its configuration files. * `atop` binary doesn't exist * its file in `/etc/init.d` doesn't exist * no systemd service called `atop...
I've installed
atop
on a orangePi/RaspberryPi with only 40MB for /var/log
.
It quickly filled my partition to 100%, I then purged it apt purge atop
to remove it and its configuration files.
* atop
binary doesn't exist
* its file in /etc/init.d
doesn't exist
* no systemd service called atop
Nevertheless, after a reboot, the /var/log/atop
folder comes back full with the old atop files.
* **How can I completely remove it for good?**
Edit:
$ ls /var/log/atop/
atop_20200822 atop_20200826 atop_20200830 atop_20200903 atop_20200907 atop_20200911 atop_20200915 atop_20200919
atop_20200823 atop_20200827 atop_20200831 atop_20200904 atop_20200908 atop_20200912 atop_20200916 atop_20200920
atop_20200824 atop_20200828 atop_20200901 atop_20200905 atop_20200909 atop_20200913 atop_20200917 atop_20200921
atop_20200825 atop_20200829 atop_20200902 atop_20200906 atop_20200910 atop_20200914 atop_20200918 atop_20200922
I removed atop on 2020/09/22
, the last file. And it regenerates the same folder with the same files again and again after each reboot.
### atop folders and files:
# find . -type d -name *atop*
./usr/share/doc/libatopology2
./var/log.hdd/atop
./var/log/atop
./var/cache/atop.d
# find . -type f -name *atop*
./usr/lib/arm-linux-gnueabihf/libatopology.so.2.0.0
./var/log.hdd/atop/atop_20200913
...
./var/log.hdd/atop/atop_20200906
./var/log/atop/atop_20200830
...
./var/log/atop/atop_20200826
./var/cache/atop.d/atop.acct
./var/lib/dpkg/info/libatopology2:armhf.symbols
./var/lib/dpkg/info/libatopology2:armhf.md5sums
./var/lib/dpkg/info/libatopology2:armhf.list
./var/lib/dpkg/info/libatopology2:armhf.triggers
./var/lib/dpkg/info/libatopology2:armhf.shlibs
無名前
(729 rep)
Oct 3, 2020, 01:59 AM
• Last activity: Oct 3, 2020, 03:31 AM
9
votes
1
answers
2898
views
`dd` is running at full speed, but I only see 20% disk utilization. Why?
1. `sudo dd if=/dev/sda of=/dev/null bs=1M iflag=direct` 2. `atopsar -d 5 # in a second terminal` 3. `top # in a third terminal` Results from `atopsar` : ``` 19:18:32 disk busy read/s KB/read writ/s KB/writ avque avserv _dsk_ ... 19:16:50 sda 18% 156.5 1024.0 0.0 0.0 5.0 1.15 ms 19:16:55 sda 18% 156...
1.
sudo dd if=/dev/sda of=/dev/null bs=1M iflag=direct
2. atopsar -d 5 # in a second terminal
3. top # in a third terminal
Results from atopsar
:
19:18:32 disk busy read/s KB/read writ/s KB/writ avque avserv _dsk_
...
19:16:50 sda 18% 156.5 1024.0 0.0 0.0 5.0 1.15 ms
19:16:55 sda 18% 156.3 1024.0 0.0 0.0 4.9 1.15 ms
...
Why is disk utilization ("busy") reported as much less than 100% ?
According to top
, the dd
process only uses 3% of a CPU or less. top
also provides an overall report of hardware and software interrupt (hi
and si
) usage of the system's CPU's, which shows as less than 1%. I have four CPUs (2 cores with 2 threads each).
/dev/sda
is a SATA HDD. It is not an SSD, it is not even a hybrid SSHD drive. It cannot read faster than about 150 megabytes per second :-). So that part of the results makes sense: 156 read/s * 1024 KB/read = 156 MB/s
The kernel version is 5.0.9-200.fc29.x86_64
(Fedora Workstation 29). The IO scheduler is mq-deadline
. Since kernel version 5.0, Fedora uses the multi-queue block layer. Because the single queue block layer has been removed :-).
I believe the disk utilization figure in atopsar -d
and atop
is calculated from one of the [kernel iostat fields](https://github.com/torvalds/linux/blob/v5.0/Documentation/iostats.txt) . The linked doc mentions "field 10 -- # of milliseconds spent doing I/Os". There is a more detailed definition as well, although I am not sure that the functions it mentions still exist in the multi-queue block layer. As far as I can tell, both atopsar -d
and atop
use [common code](https://github.com/Atoptool/atop/blob/v2.4.0/photosyst.c#L992) to read this field 10. (I believe this field is also used by sar -d
/ iostat -x
/ mxiostat.py
)
## Additional tests
Variant 2: Changing to bs=512k
, but keeping iflag=direct
.
dd if=/dev/sda of=/dev/null bs=512k iflag=direct
19:18:32 disk busy read/s KB/read writ/s KB/writ avque avserv _dsk_
...
19:18:00 sda 35% 314.0 512.0 0.0 0.0 2.1 1.12 ms
19:18:05 sda 35% 313.6 512.0 0.2 4.0 2.1 1.11 ms
Variant 3: Using bs=1M
, but removing iflag=direct
. dd
uses about 10% CPU, and 35% disk.
dd if=/dev/sda of=/dev/null bs=1M
19:18:32 disk busy read/s KB/read writ/s KB/writ avque avserv _dsk_
...
19:21:47 sda 35% 242.3 660.2 0.0 0.0 5.4 1.44 ms
19:21:52 sda 31% 232.3 667.8 0.0 0.0 9.5 1.33 ms
# How to reproduce these results - essential details
## Beware of the last test, i.e. running dd
*without* iflag=direct
It is a bit of a hog. I saw it freeze the system (mouse cursor) for ten seconds or longer. Even when I had swap disabled. (The test fills your RAM with [buff/cache](https://unix.stackexchange.com/questions/440558/30-of-ram-is-buffers-what-is-it) . It is filling the inactive LRU list. I think the turnover evicts inactive cache pages relatively quickly. At the same time, the disk is busy with sequential reads, so it takes longer when you need to page something in. How bad this gets probably depends on whether the kernel ends up also turning over the active LRU list, or shrinking it too much. I.e. how well the current ["mash of a number of different algorithms with a number of modifications for catching corner cases and various optimisations"](https://unix.stackexchange.com/questions/281974/what-page-replacement-algorithms-are-used-in-linux-kernel-for-os-file-cache/282008) is working in your case).
## The exact results of the first test are difficult to reproduce.
Sometimes, KB/read
shows as 512
instead of 1024
. In this case, the other results look more like the results from bs=512k
. Including that it shows a disk utilization around 35%, instead of around 20%. My question stands in either case.
If you would like to understand this behaviour, it is described here: https://unix.stackexchange.com/questions/529529/why-is-the-size-of-my-io-requests-being-limited-to-about-512k
sourcejedi
(53232 rep)
May 4, 2019, 07:10 PM
• Last activity: Jul 11, 2020, 04:00 PM
0
votes
1
answers
1629
views
`systemctl status atop` shows it has failed, but there is no error message
I installed `atop`, and then immediately upgraded from Fedora 29 to Fedora 30. Now `atop` failed to start during boot. But there is no error message. It only shows that `atop` exited with code 7. Why do I not see an error message? $ rpm -q atop atop-2.4.0-1.fc30.x86_64 $ journalctl -u atop -- Logs b...
I installed
atop
, and then immediately upgraded from Fedora 29 to Fedora 30.
Now atop
failed to start during boot. But there is no error message. It only shows that atop
exited with code 7. Why do I not see an error message?
$ rpm -q atop
atop-2.4.0-1.fc30.x86_64
$ journalctl -u atop
-- Logs begin at Wed 2018-02-28 17:23:24 GMT, end at Tue 2019-04-30 22:41:24 BST. --
Apr 30 17:27:05 fedora29-vm systemd: Started Atop advanced performance monitor.
Apr 30 18:34:31 fedora29-vm systemd: Stopping Atop advanced performance monitor...
Apr 30 18:34:34 fedora29-vm systemd: Stopped Atop advanced performance monitor.
-- Reboot --
Apr 30 19:48:56 fedora29-vm systemd: Started Atop advanced performance monitor.
Apr 30 19:48:58 fedora29-vm systemd: atop.service: Main process exited, code=exited, status=7/NOTRUNNING
Apr 30 19:48:58 fedora29-vm systemd: atop.service: Failed with result 'exit-code'.
sourcejedi
(53232 rep)
Apr 30, 2019, 09:48 PM
• Last activity: Aug 13, 2019, 08:20 AM
0
votes
1
answers
615
views
atop died with exit status 53
I was running `atop`. It was working, but later it died with exit status 53. $ atop $ echo $? 53 Is this a bug in `atop`? Or is my system failing? ## Pre-emptive extra details :-) I checked in the kernel log (`dmesg`). I found nothing *specifically* alarming, but there were some errors. See below. T...
I was running
atop
. It was working, but later it died with exit status 53.
$ atop
$ echo $?
53
Is this a bug in atop
? Or is my system failing?
## Pre-emptive extra details :-)
I checked in the kernel log (dmesg
). I found nothing *specifically* alarming, but there were some errors. See below.
This laptop has been used reliably for months now. The kernel is not "tainted", therefore there have been no kernel WARN or OOPS messages.
$ cat /proc/sys/kernel/tainted
0
$ rpm -q atop
atop-2.3.0-10.fc28.x86_64
$ uname -r # My kernel version
5.1.6-200.fc29.x86_64
dmesg
reminded me that I recently woke the computer from sleep, and I started atop
before it went to sleep.
dmesg
also happens shows USB and FAT errors since I lost connection with a poorly-connected USB stick at some point during this.
There were some hardware messages during resume from suspend. But I have not had any user-visible problem with sleep/wake, in the past several months.
There are also some earlier dmesg
errors.
---
[40654.082396] PM: suspend exit
[40654.205336] Bluetooth: hci0: Intel Bluetooth firmware file: intel/ibt-hw-37.8.10-fw-1.10.3.11.e.bseq
[40654.517627] Bluetooth: hci0: unexpected event for opcode 0xfc2f
[40654.534651] Bluetooth: hci0: Intel firmware patch completed and activated
[40655.519592] e1000e 0000:00:19.0 eno1: Detected Hardware Unit Hang:
TDH
TDT
next_to_use
next_to_clean
buffer_info[next_to_clean]:
time_stamp
next_to_watch
jiffies
next_to_watch.status
MAC Status
PHY Status
PHY 1000BASE-T Status
PHY Extended Status
PCI Status
[40656.239991] Process accounting resumed
[40656.318355] e1000e: eno1 NIC Link is Down
...
[41013.654127] usb 2-2: USB disconnect, device number 43
[41013.901953] usb 2-2: new full-speed USB device number 46 using xhci_hcd
[41020.270316] FAT-fs (sdb4): Directory bread(block 22704) failed
[41020.270322] FAT-fs (sdb4): Directory bread(block 22705) failed
[41020.270325] FAT-fs (sdb4): Directory bread(block 22706) failed
[41020.270328] FAT-fs (sdb4): Directory bread(block 22707) failed
[41020.270331] FAT-fs (sdb4): Directory bread(block 22708) failed
[41020.270333] FAT-fs (sdb4): Directory bread(block 22709) failed
[41020.270336] FAT-fs (sdb4): Directory bread(block 22710) failed
[41020.270339] FAT-fs (sdb4): Directory bread(block 22711) failed
[41020.270354] FAT-fs (sdb4): Directory bread(block 22704) failed
[41020.270357] FAT-fs (sdb4): Directory bread(block 22705) failed
[41029.558085] usb 2-2: device descriptor read/64, error -110
[41029.780057] usb 2-2: device descriptor read/64, error -71
[41029.996102] usb 2-2: new full-speed USB device number 47 using xhci_hcd
[41030.110057] usb 2-2: device descriptor read/64, error -71
[41030.332052] usb 2-2: device descriptor read/64, error -71
[41030.434046] usb usb2-port2: attempt power cycle
[41031.061973] usb 2-2: new full-speed USB device number 48 using xhci_hcd
[41031.062086] usb 2-2: Device not responding to setup address.
[41031.266212] usb 2-2: Device not responding to setup address.
[41031.474101] usb 2-2: device not accepting address 48, error -71
[41031.588100] usb 2-2: new full-speed USB device number 49 using xhci_hcd
[41031.588294] usb 2-2: Device not responding to setup address.
[41031.794271] usb 2-2: Device not responding to setup address.
[41032.002095] usb 2-2: device not accepting address 49, error -71
[41032.002202] usb usb2-port2: unable to enumerate USB device
[41070.004040] usb 2-1: new high-speed USB device number 50 using xhci_hcd
[41075.638042] usb 2-1: device descriptor read/64, error -110
[41075.877302] usb 2-1: New USB device found, idVendor=0718, idProduct=063d, bcdDevice= 1.00
[41075.877309] usb 2-1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[41075.877313] usb 2-1: Product: Nano Pro
[41075.877316] usb 2-1: Manufacturer: Imation
[41075.877319] usb 2-1: SerialNumber: 0703335B0AF92D86
[41075.881530] usb-storage 2-1:1.0: USB Mass Storage device detected
[41075.882899] scsi host3: usb-storage 2-1:1.0
[41077.004052] scsi 3:0:0:0: Direct-Access Imation Nano Pro PMAP PQ: 0 ANSI: 4
[41077.005015] sd 3:0:0:0: Attached scsi generic sg1 type 0
[41078.366583] sd 3:0:0:0: [sdc] 15116736 512-byte logical blocks: (7.74 GB/7.21 GiB)
[41078.366732] sd 3:0:0:0: [sdc] Write Protect is off
[41078.366735] sd 3:0:0:0: [sdc] Mode Sense: 23 00 00 00
[41078.366935] sd 3:0:0:0: [sdc] No Caching mode page found
[41078.366939] sd 3:0:0:0: [sdc] Assuming drive cache: write through
[41078.389637] sdc: sdc1 sdc2 sdc3 sdc4
[41078.392305] sd 3:0:0:0: [sdc] Attached SCSI removable disk
Errors from earlier:
$ journalctl -o short-monotonic -k -p err
[ 14.953318] alan-laptop kernel: i801_smbus 0000:00:1f.3: Failed to enable SMBus PCI device (-16)
[ 15.919042] alan-laptop kernel: uvcvideo: Failed to query (GET_INFO) UVC control 3 on unit 1: -32 (exp. 1).
[ 15.974276] alan-laptop kernel: Bluetooth: hci0: unexpected event for opcode 0xfc2f
[ 19.812805] alan-laptop kernel: db_root: cannot open: /etc/target
[ 35.219443] alan-laptop kernel: Bluetooth: hci0: unexpected event for opcode 0xfc2f
...
[19625.693967] alan-laptop kernel: Bluetooth: hci0: unexpected event for opcode 0xfc2f
[20606.292276] alan-laptop kernel: ACPI Error: Thread 3649118016 cannot release Mutex [PATM] acquired by thread 2284756800 (20190215/exmutex-382)
[20606.292282] alan-laptop kernel: ACPI Error: Aborting method \_SB.PCI0.LPCB.ECDV._Q66 due to previous error (AE_AML_NOT_OWNER) (20190215/psparse-53>
[20609.491788] alan-laptop kernel: Bluetooth: hci0: unexpected event for opcode 0xfc2f
[22006.616549] alan-laptop kernel: Bluetooth: hci0: unexpected event for opcode 0xfc2f
[28469.301081] alan-laptop kernel: Bluetooth: hci0: unexpected event for opcode 0xfc2f
[28472.604260] alan-laptop kernel: Bluetooth: hci0: command 0x0c56 tx timeout
[30297.940165] alan-laptop kernel: Bluetooth: hci0: unexpected event for opcode 0xfc2f
[30355.700904] alan-laptop kernel: mmc0: error -110 doing runtime resume
[30355.939525] alan-laptop kernel: Bluetooth: hci0: unexpected event for opcode 0xfc2f
sourcejedi
(53232 rep)
Jun 20, 2019, 03:17 PM
• Last activity: Jun 20, 2019, 03:25 PM
3
votes
1
answers
3720
views
IO wait time is higher than disk utilization. Isn't this impossible?
I am trying to improve my understanding, following this (so far) unanswered question: https://unix.stackexchange.com/questions/516433/possible-limiting-factor-during-upgrade-of-fedora-vm-not-disk-or-cpu-or-networ I ran the following test load, which took 200 seconds to complete. `sudo perf trace -s...
I am trying to improve my understanding, following this (so far) unanswered question: https://unix.stackexchange.com/questions/516433/possible-limiting-factor-during-upgrade-of-fedora-vm-not-disk-or-cpu-or-networ
I ran the following test load, which took 200 seconds to complete.
sudo perf trace -s time perf stat dnf -y --releasever=30 --installroot=$HOME/fedora-30 --disablerepo='*' --enablerepo=fedora --enablerepo=updates install systemd passwd dnf fedora-release vim-minimal
I am running this on a fairly default, straightforward install of Fedora Workstation 29. It is not a VM. The kernel version is 5.0.9-200.fc29.x86_64. The IO scheduler is mq-deadline
.
I use LVM, and the ext4
filesystem. I am not using any encryption on my disk or filesystem. I do not have any network filesystem mounted at all, so I am not reading or writing a network filesystem.
I have 4 "CPUs": 2 cores with 2 threads each.
I have only one disk, /dev/sda
, which is a SATA HDD. The HDD supports NCQ: cat /sys/class/block/sda/device/queue_depth
shows 32
.
vmstat 5
showed that non-idle CPU time *sometimes* rose to about one CPU, i.e. idle was as low as 75%.
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
...
1 1 3720 1600980 392948 3669196 0 0 14 30634 8769 1876 4 3 78 15 0
1 1 3720 1600144 393128 3669884 0 0 0 3460 1406 1468 0 1 80 18 0
0 1 3720 1599424 393356 3670532 0 0 0 6830 1416 1480 0 1 73 25 0
0 1 3720 1598448 393700 3671108 0 0 0 7766 1420 1362 0 1 78 20 0
0 1 3720 1597296 393940 3671616 0 0 0 5401 1374 1367 1 2 87 11 0
0 1 3720 1596332 394312 3672416 0 0 0 1162 1647 1558 1 2 85 13 0
3 0 3720 1652064 394564 3673540 0 0 0 17897 15406 1784 1 3 74 23 0
0 0 3720 1972876 394600 3541064 0 0 0 522 8028 18327 3 3 84 10 0
1 0 3720 1974152 394600 3541108 0 0 0 9 422 879 2 1 97 0 0
0 0 3720 1974136 394600 3541120 0 0 0 0 204 455 0 0 99 0 0
(end of test)
And the "IO wait" time (the wa
field under cpu
in vmstat
) rose as high as 25%. I think this means 100% of one CPU. But atopsar -d 5
showed the utilization of my disk did not directly match this. It was much less than 100%:
22:46:44 disk busy read/s KB/read writ/s KB/writ avque avserv _dsk_
...
22:49:34 sda 5% 0.4 4.0 69.5 413.0 36.9 0.68 ms
22:49:39 sda 8% 0.2 60.0 120.6 30.6 18.7 0.66 ms
22:49:44 sda 8% 0.0 0.0 136.2 16.7 20.4 0.61 ms
22:49:49 sda 10% 0.0 0.0 157.1 44.2 21.4 0.65 ms
22:49:54 sda 9% 0.0 0.0 196.4 39.3 48.0 0.47 ms
22:49:59 sda 9% 0.0 0.0 148.9 36.6 32.6 0.62 ms
22:50:04 sda 10% 0.0 0.0 137.3 130.6 37.2 0.70 ms
22:50:09 sda 11% 0.0 0.0 199.6 5.4 13.5 0.55 ms
22:50:14 sda 2% 0.0 0.0 50.2 4.5 11.8 0.32 ms
22:50:19 sda 0% 0.0 0.0 0.8 11.0 13.3 0.75 ms
(end of test)
How can "IO wait" time be higher than disk utilization?
> Following is the definition taken from the sar manpage:
>
> %iowait:
>
> Percentage of time that the CPU or CPUs were idle during which the
> system had an outstanding disk I/O request.
>
> Therefore, %iowait means that from the CPU point of view, no tasks
> were runnable, but at least one I/O was in progress. iowait is simply
> a form of idle time when nothing could be scheduled. The value may or
> may not be useful in indicating a performance problem, but it does
> tell the user that the system is idle and could have taken more work.
>
> https://support.hpe.com/hpsc/doc/public/display?docId=c02783994
"IO wait" is tricksy to define on multi-CPU systems. See https://unix.stackexchange.com/questions/410628/how-does-a-cpu-know-there-is-io-pending . But even if you think I was wrong to multiply the above "IO wait" figure by 4, it would still be higher than the disk utilization figure!
I expect the disk utilization figure in atopsar -d
(and equally in atop
/ sar -d
/ iostat -x
/ mxiostat.py
) is calculated from one of the [kernel iostat fields](https://github.com/torvalds/linux/blob/v5.0/Documentation/iostats.txt) . The linked doc mentions "Field 10 -- # of milliseconds spent doing I/Os". There is a more detailed definition as well, although I am not sure that the functions it mentions still exist in the current multi-queue block layer.
---
Thanks to the perf
in the test command, I can also report that dnf's fdatasync()
calls were responsible for 81 out of 200 seconds of elapsed time. This evidence suggests to me that the "IO wait" figure is giving a more accurate impression than the disk utilization figure.
199.440461084 seconds time elapsed
60.043226000 seconds user
11.858057000 seconds sys
60.07user 12.17system 3:19.84elapsed 36%CPU (0avgtext+0avgdata 437844maxresident)k
496inputs+2562448outputs (25major+390722minor)pagefaults 0swaps
Summary of events:
...
dnf (6177), 2438150 events, 76.0%
syscall calls total min avg max stddev
(msec) (msec) (msec) (msec) (%)
--------------- -------- --------- --------- --------- --------- ------
fdatasync 3157 81436.062 0.160 25.795 465.329 1.92%
wait4 51 43911.444 0.017 861.009 32770.388 74.93%
poll 45188 6052.070 0.001 0.134 96.764 5.58%
read 341567 2718.531 0.001 0.008 1372.100 50.63%
write 120402 1616.062 0.002 0.013 155.142 9.61%
getpid 50012 755.423 0.001 0.015 207.506 32.86%
...
sourcejedi
(53232 rep)
May 2, 2019, 10:25 PM
• Last activity: May 3, 2019, 03:14 PM
1
votes
0
answers
32
views
Possible limiting factor during upgrade of Fedora VM - not disk or cpu or network?
I'm upgrading a VM from Fedora 29 to Fedora 30. I triggered the update from inside GNOME Software. It downloaded the upgrade, and asked to reboot. It started installing... `atop` shows the `qemu` process using about 15% of one CPU core. And my one hard drive at about 15% utilization. What other reso...
I'm upgrading a VM from Fedora 29 to Fedora 30. I triggered the update from inside GNOME Software. It downloaded the upgrade, and asked to reboot. It started installing...
atop
shows the qemu
process using about 15% of one CPU core. And my one hard drive at about 15% utilization.
What other resource(s) could the VM possibly be waiting for, the other ~ 70% of the time? Is it just [sleepy](https://en.wikipedia.org/wiki/Sleep_(system_call)) ?? Is there some other way to explain or find this out?
The upgrade showed the normal progress indicator, and completed without errors.
The VM is in libvirt (virt-manager), using the default KVM backend on x86-64. I configured the VM disk using cache mode "none". There's a 2x2 matrix of caching options; "none" means O_DIRECT, but not O_SYNC. (My intent was to avoid [filling the host kernel's writeback cache](https://unix.stackexchange.com/questions/474119/system-lags-when-doing-large-r-w-operations-on-external-disks/474171) ; at least it seems to have left the host system so much more responsive than the last time I tried this). "IO Mode" is set to "Hypervisor default". The virtual disk controller is using VirtIO. libvirt is from Fedora 29; it is version 4.7.0 and qemu is version 3.0.0.
sourcejedi
(53232 rep)
Apr 30, 2019, 06:41 PM
• Last activity: Apr 30, 2019, 10:33 PM
1
votes
1
answers
3726
views
How to read several days log files when use `atopsar`?
My system is centos 7.4 with `atop-2.3.0-8.el7.x86_64`. I already have 3 days log in `var/log/atop`, such as `atop_20190312 atop_20190313 atop_20190314`. I want to find out the peak period of CPU consumption in those 3 days. Based on [this post][1],I tried: atopsar -r /var/log/atop/ -A -R 3 -c |head...
My system is centos 7.4 with
atop-2.3.0-8.el7.x86_64
.
I already have 3 days log in var/log/atop
, such as atop_20190312 atop_20190313 atop_20190314
.
I want to find out the peak period of CPU consumption in those 3 days.
Based on this post ,I tried:
atopsar -r /var/log/atop/ -A -R 3 -c |head
but failed.
How to read several days atop log into one?
kittygirl
(111 rep)
Mar 14, 2019, 09:51 AM
• Last activity: Mar 14, 2019, 05:56 PM
7
votes
1
answers
3915
views
Writeback cache (`dirty`) seems to be limited to even less than dirty_background_ratio. What is it being limited by? How is this limit calculated?
I have been testing Linux `4.18.16-200.fc28.x86_64`. My system has 7.7G total RAM, according to `free -h`. I have default values for the `vm.dirty*` sysctl's. `dirty_background_ratio` is 10, and `dirty_ratio` is 20. Based on everything I've read, I expect Linux to begin writeout of dirty cache when...
I have been testing Linux
4.18.16-200.fc28.x86_64
. My system has 7.7G total RAM, according to free -h
.
I have default values for the vm.dirty*
sysctl's. dirty_background_ratio
is 10, and dirty_ratio
is 20. Based on everything I've read, I expect Linux to begin writeout of dirty cache when it reaches 10% of RAM: 0.77G. And buffered write() calls should block when dirty cache reaches 20% of RAM: 1.54G.
I ran dd if=/dev/zero of=~/test bs=1M count=2000
and watched the dirty
field in atop
. While the dd
command was running, the dirty
value settled at around 0.5G. This is significantly less than the dirty background threshold (0.77G)! How can this be? What am I missing?
dirty_expire_centisecs
is 3000, so I don't think that can be the cause. I even tried lowering dirty_expire_centisecs
to 100, and dirty_writeback_centisecs
to 10, to see if that was limiting dirty
. This did not change the result.
I initially wrote these observations as part of this investigation: https://unix.stackexchange.com/questions/480399/why-were-usb-stick-stall-problems-reported-in-2013-why-wasnt-this-problem-al/
---
I understand that half-way between the two thresholds - 15% = 1.155G - write() calls start being throttled (delayed) on a curve. But no delay is added when underneath this ceiling; the processes generating dirty pages are allowed "free run".
As I understand it, the throttling aims to keep the dirty cache somewhere at or above 15%, and prevent hitting the 20% hard limit. It does not provide a guarantee for every situation. But I'm testing a simple case with one dd
command; I think it should simply ratelimit the write() calls to match the writeout speed achieved by the device.
(There is not a simple guarantee because there are some complex exceptions. For example, the throttle code limits the delay it will impose to a maximum of 200ms. But not if the target ratelimit for the process is less than one page per second; in that case it will apply a strict ratelimit.)
* [Documentation/sysctl/vm.txt](https://github.com/torvalds/linux/blob/v4.18/Documentation/sysctl/vm.txt) -- Linux v4.18
* [No-I/O dirty throttling](https://lwn.net/Articles/456904/) -- 2011 LWN.net.
* > (dirty_background_ratio + dirty_ratio)/2 dirty data in
total ... is an amount of dirty data when we start to throttle
processes -- [Jan Kara, 2013](https://lore.kernel.org/lkml/20131029203050.GE9568@quack.suse.cz/)
* > Users will notice that the applications will get throttled once crossing
the global (background + dirty)/2=15% threshold, and then balanced around
17.5%. Before patch, the behavior is to just throttle it at 20% dirtyable
memory
> -- commit 143dfe8611a6, "[writeback: IO-less balance_dirty_pages()](https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/mm/page-writeback.c?id=143dfe8611a63030ce0c79419dc362f7838be557) "
* > The memory-management subsystem will, by default, try to limit dirty pages to a maximum of 15% of the memory on the system. There is a "magical function" called balance_dirty_pages() that will, if need be, throttle processes dirtying a lot of pages in order to match the rate at which pages are being dirtied and the rate at which they can be cleaned." -- [Writeback and control groups](https://lwn.net/Articles/648292/) , 2015 LWN.net.
* [balance_dirty_pages()](https://elixir.bootlin.com/linux/v4.18.16/source/mm/page-writeback.c#L1554) in Linux 4.18.16.
sourcejedi
(53232 rep)
Nov 7, 2018, 11:27 PM
• Last activity: Feb 21, 2019, 12:36 PM
2
votes
1
answers
722
views
Why does atop open a raw socket?
I install `atop` on Debian 9. It runs as a monitoring daemon. Why is it listening on a raw socket? Raw sockets are used to generate arbitrary IPv4 packets or capture all packets read all packets for a given IP sub-protocol! But I don't think my `atop` and its logs show any information from reading p...
I install
atop
on Debian 9. It runs as a monitoring daemon.
Why is it listening on a raw socket? Raw sockets are used to generate arbitrary IPv4 packets or capture all packets read all packets for a given IP sub-protocol! But I don't think my atop
and its logs show any information from reading packets. I don't even use netatop
- and that would require a kernel module, which is not included in Debian. And I would be extremely surprised if any of the atop
features involve *sending* raw IP packets.
$ sudo netstat -l --raw -ep
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State User Inode PID/Program name
raw 0 0 0.0.0.0:255 0.0.0.0:* 7 root 2427667 7353/atop
$ sudo ss -l --raw -p
State Recv-Q Send-Q Local Address:Port Peer Address:Port
UNCONN 0 0 *:ipproto-255 *:* users:(("atop",pid=7353,fd=4))
sourcejedi
(53232 rep)
Nov 13, 2018, 11:54 AM
• Last activity: Nov 13, 2018, 08:31 PM
Showing page 1 of 20 total questions