Unix & Linux Stack Exchange

Q&A for users of Linux, FreeBSD and other Unix-like operating systems

Latest Questions

4 votes

1 answers

3554 views

Finding which specific Python process was killed by Linux OOM killer

linux linux-kernel python out-of-memory python3

I'm trying to figure out which specific Python process or executable was killed by the Linux OOM killer. In /var/log/messages I get this: Aug 18 03:19:11 169 kernel: [ 7747] 0 7748 3226957 2875051 5692 0 0 python (notice it just say name = "python") and this: Aug 18 03:19:11 169 kernel: Killed proce...

                                  I'm trying to figure out which specific Python process or executable was killed by the Linux OOM killer.  

In /var/log/messages I get this:

    Aug 18 03:19:11 169 kernel: [ 7747]     0  7748  3226957  2875051    5692        0             0 python
(notice it just say name = "python")

and this:

    Aug 18 03:19:11 169 kernel: Killed process 7748 (python) total-vm:12907828kB, anon-rss:11500204kB, file-rss:0kB

(again notice it just says the process was "python")

Ideally a log file for the process would have the PID somewhere.  But suppose the logs wrapped (or suppose the process doesn't log the PID anywhere).  

Does Linux provide a way to figure out the full command that was executed for the process?  It would be nice to configure Linux's OOM killer to display the full name in the process table, such as:

    /usr/bin/python /usr/lib/python2.7/site-packages/foo.pyc

Or maybe at the time of the OOM error Linux stores off some of the process details somewhere?  i.e. copy the processes from /proc to X?  (wishful thinking)

### NOTE: ###
This question is very similar to this question: https://stackoverflow.com/questions/624857/finding-which-process-was-killed-by-linux-oom-killer 

But it fell short of what I'm trying to figure out.

jersey bean (553 rep)

Aug 18, 2017, 09:32 PM • Last activity: Jul 20, 2025, 12:08 PM

4 votes

2 answers

183 views

How to find out a process's proportional use of system-wide Committed_AS memory on Linux?

linux linux-kernel memory monitoring out-of-memory

On Linux, it's possible to [disable overcommitting memory](https://unix.stackexchange.com/questions/797835/disabling-overcommitting-memory-seems-to-cause-allocs-to-fail-too-early-what-co/797836#797836) which makes it behave like Windows, in that `malloc()` will fail once all physical memory is used...

                                  On Linux, it's possible to [disable overcommitting memory](https://unix.stackexchange.com/questions/797835/disabling-overcommitting-memory-seems-to-cause-allocs-to-fail-too-early-what-co/797836#797836)  which makes it behave like Windows, in that malloc() will fail once all physical memory is used up. As explained [in this insightful and good answer](https://unix.stackexchange.com/a/797888/104885) , in that mode, the Committed_AS memory statistic shown in /proc/meminfo becomes the relevant value for used up memory, rather than any of the other metrics like calculating it based on MemFree and so on.

**Here's my question:** So when running that mode, how do I find out a process's proportional use of system-wide Committed_AS total value on Linux? Is there an easy way to do so?

**As for more background info**, I've been using this mode now for some days. It's useful for example to test out how software I work on would behave on Windows when hitting the memory limit.

However I ran into the practical issue when I run out of memory, it's hard to find the biggest offenders. It seems to be the case that no common system monitor tool shows how much a process actually committed in terms of memory, since in my understanding the usual resident memory, shared memory, and so on only apply to memory *actually written into* (which I think is smaller than committed memory).

Hence, it becomes difficult to judge which program actually committed the most memory and may be worth terminating when I run out. Seeing the committed memory might also help identifying programs that accidentally use fork() in situations where they perhaps should be using vfork().

E. K. (153 rep)

Jul 16, 2025, 08:15 AM • Last activity: Jul 17, 2025, 06:33 PM

1 votes

1 answers

107 views

Disabling overcommitting memory seems to cause allocs to fail too early, what could be the reason?

linux linux-kernel memory out-of-memory

I tested out `echo 2 > /proc/sys/vm/overcommit_memory`, which I know isn't a commonly used or recommended mode, but for various reasons it could be beneficial for some of my workloads. However, when I tested this out on a desktop system with 15.6GiB RAM, with barely a quarter of memory used, most pr...

I tested out echo 2 > /proc/sys/vm/overcommit_memory, which I know isn't a commonly used or recommended mode, but for various reasons it could be beneficial for some of my workloads. However, when I tested this out on a desktop system with 15.6GiB RAM, with barely a quarter of memory used, most programs would already start crashing or erroring, and Brave would fail to open tabs:

$ dmesg
...
[24551.333140] __vm_enough_memory: pid: 19014, comm: brave, bytes: 268435456 not enough memory for the allocation
[24551.417579] __vm_enough_memory: pid: 19022, comm: brave, bytes: 268435456 not enough memory for the allocation
[24552.506934] __vm_enough_memory: pid: 19033, comm: brave, bytes: 268435456 not enough memory for the allocation
$ ./smem -tkw
Area                           Used      Cache   Noncache 
firmware/hardware                 0          0          0 
kernel image                      0          0          0 
kernel dynamic memory          4.0G       3.5G     519.5M 
userspace memory               3.4G       1.3G       2.1G 
free memory                    8.2G       8.2G          0 
----------------------------------------------------------
                              15.6G      13.0G       2.7G

I understand that with overcommitting memory disabled, fork() instead of vfork() which many Linux suboptimally programs use, can cause issues once the process has more memory allocated. But it seems like this isn't the case here, since 1. the affected processes seem to at most use a few hundred megabytes of memory, and 2. the allocation listed in dmesg as failing is way smaller than what's listed as free, and 3. the overall system memory doesn't seem to be even a quarter filled up. Some more system info:

# /sbin/sysctl vm.overcommit_ratio vm.overcommit_kbytes vm.admin_reserve_kbytes vm.user_reserve_kbytes
vm.overcommit_ratio = 50
vm.overcommit_kbytes = 0
vm.admin_reserve_kbytes = 8192
vm.user_reserve_kbytes = 131072

I'm therefore wondering what the cause here is. Is there some obvious reason for this, perhaps some misconfiguration on my part that could be improved? **Update:** so, in part it seems to have been the overcommit_ratio that @StephenKitt helped me find, which needed adjustment like this:

echo 2 > /proc/sys/vm/overcommit_memory
echo 100 > /proc/sys/vm/overcommit_ratio

But now I seem to be running into another wall, and I first thought it would be the fork() vs vfork() issue, but instead it seems to be once app memory usage reaches the dynamic kernel memory:

I'm guessing it may not be intended the kernel keeps sitting on this dynamic memory of more than 6GiB forever without that being usable. Does anybody have an idea why it behaves like that with overcommitting disabled? Perhaps I'm missing something here. **Update 2:** Here's more information collected when hitting this weird condition again, where the dynamic kernel memory won't get out of the way:

[32915.298484] __vm_enough_memory: pid: 24347, comm: brave, bytes: 268435456 not enough memory for the allocation
[32916.293690] __vm_enough_memory: pid: 24355, comm: brave, bytes: 268435456 not enough memory for the allocation
# exit
~/Develop/smem $ ./smem -tkw
Area                           Used      Cache   Noncache 
firmware/hardware                 0          0          0 
kernel image                      0          0          0 
kernel dynamic memory          7.8G       7.4G     384.0M 
userspace memory               5.2G       1.5G       3.7G 
free memory                    2.7G       2.7G          0 
----------------------------------------------------------
                              15.6G      11.5G       4.1G 
~/Develop/smem $ cat /proc/sys/vm/overcommit_ratio
100
~/Develop/smem $ cat /proc/sys/vm/overcommit_memory
2
~/Develop/smem $ cat /proc/meminfo 
MemTotal:       16384932 kB
MemFree:         2803496 kB
MemAvailable:   10297132 kB
Buffers:            1796 kB
Cached:          8749580 kB
SwapCached:            0 kB
Active:          7032032 kB
Inactive:        4760088 kB
Active(anon):    4698776 kB
Inactive(anon):        0 kB
Active(file):    2333256 kB
Inactive(file):  4760088 kB
Unevictable:      825908 kB
Mlocked:            1192 kB
SwapTotal:       2097148 kB
SwapFree:        2097148 kB
Zswap:                 0 kB
Zswapped:              0 kB
Dirty:               252 kB
Writeback:             0 kB
AnonPages:       3866720 kB
Mapped:          1520696 kB
Shmem:           1658104 kB
KReclaimable:     570808 kB
Slab:             743788 kB
SReclaimable:     570808 kB
SUnreclaim:       172980 kB
KernelStack:       18720 kB
PageTables:        53772 kB
SecPageTables:         0 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    18482080 kB
Committed_AS:   17610184 kB
VmallocTotal:   261087232 kB
VmallocUsed:       86372 kB
VmallocChunk:          0 kB
Percpu:              864 kB
CmaTotal:          65536 kB
CmaFree:             608 kB

E. K. (153 rep)

Jul 11, 2025, 04:59 AM • Last activity: Jul 13, 2025, 08:26 AM

0 votes

0 answers

62 views

Firefox died with "__vm_enough_memory: not enough memory for the allocation" despite several GB of free RAM and swap

virtual-memory out-of-memory

I am working on a laptop with 16 GB of non-upgradable RAM on Linux Mint 22, kernel `6.8.0-63-generic #66-Ubuntu`. I created a 32 GB swap partition on my NVMe drive with a high swappiness value [note1] and disabled overcommit (`vm.overcommit_memory=2`). This usually works fine, even with several very...

I am working on a laptop with 16 GB of non-upgradable RAM on Linux Mint 22, kernel 6.8.0-63-generic #66-Ubuntu. I created a 32 GB swap partition on my NVMe drive with a high swappiness value [note1] and disabled overcommit (vm.overcommit_memory=2). This usually works fine, even with several very hungry processes running (Firefox, Thunderbird, Zotero, VSCode, Element). Linux keeps around 5 GB of RAM for cache and 5-10 GB in swap. Slowdown when switching programs is minimal due to the fast NVMe swap. Now Firefox just died and the dmesg log is full of lines like this:

[77152.624112] __vm_enough_memory: pid: 4107, comm: IPC Launch, not enough memory for the allocation
[77152.624979] __vm_enough_memory: pid: 4612, comm: WebExtensions, not enough memory for the allocation
[77152.624986] __vm_enough_memory: pid: 4612, comm: WebExtensions, not enough memory for the allocation
[77156.091496] __vm_enough_memory: pid: 4107, comm: IPC Launch, not enough memory for the allocation
[77227.513600] __vm_enough_memory: pid: 203606, comm: Sandbox Forked, not enough memory for the allocation
[77227.514500] __vm_enough_memory: pid: 4612, comm: WebExtensions, not enough memory for the allocation
[77227.514509] __vm_enough_memory: pid: 4612, comm: WebExtensions, not enough memory for the allocation
[77227.529083] __vm_enough_memory: pid: 4007, comm: firefox-bin, not enough memory for the allocation
Sandbox Forked: segfault at 0 ip 00007d4850861bd9 sp 00007d482b7fc1f0 error 6 in libxul.so[7d484dcdb000+6bb3000] likely on CPU 7 (core 3, socket 0)
[77227.562043] Code: 48 89 01 31 c0 b9 ca 02 00 00 48 89 08 e8 3f 25 02 04 48 8d 05 f2 66 82 fb 48 8b 0d 79 8a 53 04 48 89 01 31 c0 b9 ae 02 00 00  89 08 e8 1f 25 02 04 e8 3a c1 02 04 48 8d 35 6e d9 8a fb 48 8d

However, there were about 4 GB of RAM and more than 26 GB of swap available. **Does anyone have any idea how this could happen? I doubt that Firefox tried to allocate 30 GB at once and exhausted both RAM and Swap at the same time.** Here's a screenshot from System Monitor, which happened to be running, where you can see how much was free and how much it dropped when Firefox died:

Screenshot from system monitor after the Firefox crash

--- Just for completeness, here is my kernel vm configuration and contents of /proc/meminfo:

$ sudo sysctl vm
vm.admin_reserve_kbytes = 8192
vm.compact_unevictable_allowed = 1
vm.compaction_proactiveness = 20
vm.dirty_background_bytes = 0
vm.dirty_background_ratio = 10
vm.dirty_bytes = 0
vm.dirty_expire_centisecs = 1500
vm.dirty_ratio = 20
vm.dirty_writeback_centisecs = 1500
vm.dirtytime_expire_seconds = 43200
vm.extfrag_threshold = 500
vm.hugetlb_optimize_vmemmap = 0
vm.hugetlb_shm_group = 0
vm.laptop_mode = 0
vm.legacy_va_layout = 0
vm.lowmem_reserve_ratio = 256	256	32	0	0
vm.max_map_count = 1048576
vm.memfd_noexec = 0
vm.memory_failure_early_kill = 0
vm.memory_failure_recovery = 1
vm.min_free_kbytes = 67584
vm.min_slab_ratio = 5
vm.min_unmapped_ratio = 1
vm.mmap_min_addr = 65536
vm.mmap_rnd_bits = 32
vm.mmap_rnd_compat_bits = 16
vm.nr_hugepages = 0
vm.nr_hugepages_mempolicy = 0
vm.nr_overcommit_hugepages = 0
vm.numa_stat = 1
vm.numa_zonelist_order = Node
vm.oom_dump_tasks = 1
vm.oom_kill_allocating_task = 0
vm.overcommit_kbytes = 0
vm.overcommit_memory = 2
vm.overcommit_ratio = 50
vm.page-cluster = 0
vm.page_lock_unfairness = 5
vm.panic_on_oom = 0
vm.percpu_pagelist_high_fraction = 0
vm.stat_interval = 1
vm.swappiness = 180
vm.unprivileged_userfaultfd = 0
vm.user_reserve_kbytes = 131072
vm.vfs_cache_pressure = 100
vm.watermark_boost_factor = 0
vm.watermark_scale_factor = 125
vm.zone_reclaim_mode = 0

$ cat /proc/meminfo 
MemTotal:       16024252 kB
MemFree:         2941612 kB
MemAvailable:    5504264 kB
Buffers:          346236 kB
Cached:          4639316 kB
SwapCached:       979904 kB
Active:          7280640 kB
Inactive:        2468760 kB
Active(anon):    6426732 kB
Inactive(anon):   529828 kB
Active(file):     853908 kB
Inactive(file):  1938932 kB
Unevictable:     1809680 kB
Mlocked:             592 kB
SwapTotal:      33554428 kB
SwapFree:       29745344 kB
Zswap:                 0 kB
Zswapped:              0 kB
Dirty:              3088 kB
Writeback:             0 kB
AnonPages:       6473408 kB
Mapped:           996316 kB
Shmem:           2192712 kB
KReclaimable:     839736 kB
Slab:            1131764 kB
SReclaimable:     839736 kB
SUnreclaim:       292028 kB
KernelStack:       33728 kB
PageTables:       100528 kB
SecPageTables:         0 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    41566552 kB
Committed_AS:   35867360 kB
VmallocTotal:   34359738367 kB
VmallocUsed:      114112 kB
VmallocChunk:          0 kB
Percpu:             9280 kB
HardwareCorrupted:     0 kB
AnonHugePages:         0 kB
ShmemHugePages:  1144832 kB
ShmemPmdMapped:        0 kB
FileHugePages:         0 kB
FilePmdMapped:         0 kB
Unaccepted:            0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:               0 kB
DirectMap4k:      408748 kB
DirectMap2M:    11821056 kB
DirectMap1G:     4194304 kB

Fritz (748 rep)

Jul 11, 2025, 09:38 AM

0 votes

2 answers

374 views

How to use vm.overcommit_memory=1 without getting system hung?

linux kernel memory out-of-memory tuning

I am using ```vm.overcommit_memory=1``` on my linux system which has been helpful to allow starting multiple applications which otherwise wouldn't even start with default value of 0, however, sometimes my system just freezes and seems the OOM killer is unable to do anything to prevent this situation...

I am using

.overcommit_memory=1

on my linux system which has been helpful to allow starting multiple applications which otherwise wouldn't even start with default value of 0, however, sometimes my system just freezes and seems the OOM killer is unable to do anything to prevent this situation. I have some swap memory which also got consumed. I've also noticed some instances when system is unresponsive, even the magic SysRq keys don't work. Sorry, no logs are available at this time to include here. In general, is there any configuration or tunable that can get the OOM killer to kill the highest memory consuming process(es) immediately without ever letting the system go unresponsive when using

.overcommit_memory=1

eagle007 (3 rep)

Nov 27, 2024, 10:10 PM • Last activity: Jul 10, 2025, 10:16 AM

0 votes

1 answers

5782 views

JAVA OPTS Xms Xmx MetaspaceSize MaxMetaspaceSize relationship with server resources

java cpu out-of-memory resources jboss

I have just started working with jboss application servers and recently we had a problem when trying to deploy an application in a new test server (RHEL 7), it happened that, when starting the jboss service (jboss eap 7.1) with the application in the deployment area, the server began to freeze, that...

                                  I have just started working with jboss application servers and recently we had a problem when trying to deploy an application in a new test server (RHEL 7), it happened that, when starting the jboss service (jboss eap 7.1) with the application in the deployment area, the server began to freeze, that is, it began to respond extremely slowly and it was necessary to turn it off, we solved the problem simply by adding more cpu and ram, in the configuration (standalone.conf) there are these parameters:

    JAVA_OPTS="-Xms4096m -Xmx4096m -XX:MetaspaceSize=256m -XX:MaxMetaspaceSize=512m

Could you give me a brief explanation of the meaning of each one and its relationship with the memory and cpu of the server? Is there any rule or recommendation to take into account to configure these parameters and server resources?
Thanks in advance.

miguel ramires (9 rep)

Jul 11, 2022, 11:01 PM • Last activity: Jun 11, 2025, 04:05 PM

4 votes

0 answers

93 views

How to find out what’s using 10GiB of my RAM when ps is only showing ~1GB?

linux memory out-of-memory memory-leaks

I’ve had mysterious memory usage on a Thinkpad E495 for the longest time. Starting with Ubuntu 20.04, through several Ubuntu versions with default kernels and xanmod kernels and now under openSUSE Leap 15.6. After a few weeks of uptime (using suspend2ram during the night) I end up with excessive mem...

                                  I’ve had mysterious memory usage on a Thinkpad E495 for the longest time. Starting with Ubuntu 20.04, through several Ubuntu versions with default kernels and xanmod kernels and now under openSUSE Leap 15.6.

After a few weeks of uptime (using suspend2ram during the night) I end up with excessive memory usage:

    free -m
                   total        used        free      shared  buff/cache   available
    Mem:           13852        9654        2345         406        2578        4198
    Swap:           2047        1144         903

Worst processes according to sudo ps aux  | awk '{print $6/1024 " MB\t\t" $11}'  | sort -n account for not even 1.5GB (I scrolled through the list, there are not 10k 1MiB processes further up the list)

    8.75 MB         sudo
    8.85156 MB              /usr/bin/akonadi_sendlater_agent
    8.94531 MB              /usr/bin/akonadi_indexing_agent
    9.08594 MB              /usr/sbin/NetworkManager
    9.49609 MB              /usr/bin/kalendarac
    9.98047 MB              /usr/bin/X
    10.9453 MB              /usr/bin/akonadi_pop3_resource
    11.2227 MB              /usr/lib/systemd/systemd-journald
    11.4492 MB              /usr/lib/kdeconnectd
    12.4375 MB              /usr/lib/xdg-desktop-portal
    14.1133 MB              /usr/bin/Xwayland
    14.3477 MB              /usr/bin/X
    17.3867 MB              /usr/lib/xdg-desktop-portal-kde
    22.8555 MB              /usr/sbin/mysqld
    24.6055 MB              /usr/bin/kded5
    24.8555 MB              weechat
    27.0703 MB              /usr/bin/akonadiserver
    92.5195 MB              /usr/bin/konsole
    113.832 MB              /usr/bin/krunner
    155.871 MB              /usr/bin/kwin_wayland
    660.578 MB              /usr/bin/plasmashell

If I keep using the laptop when it’s at this stage the out of memory daemon eventually kills my Firefox or plasmashell (after making the laptop freeze for 10 minutes while it does who knows what)

Any ideas on how to find the culprit? At this point I’m almost suspecting some kind of UEFI issue or some kernel module issue.


Edit:
cat /proc/meminfo as requested:

    MemTotal:       14184696 kB
    MemFree:         2064340 kB
    MemAvailable:    4136252 kB
    Buffers:            1076 kB
    Cached:          2677164 kB
    SwapCached:         4416 kB
    Active:          1996252 kB
    Inactive:        2999396 kB
    Active(anon):     603672 kB
    Inactive(anon):  2156904 kB
    Active(file):    1392580 kB
    Inactive(file):   842492 kB
    Unevictable:          32 kB
    Mlocked:              32 kB
    SwapTotal:       2097148 kB
    SwapFree:         924940 kB
    Zswap:                 0 kB
    Zswapped:              0 kB
    Dirty:              2140 kB
    Writeback:             0 kB
    AnonPages:       2275668 kB
    Mapped:           577764 kB
    Shmem:            443168 kB
    KReclaimable:     163648 kB
    Slab:            1060872 kB
    SReclaimable:     163648 kB
    SUnreclaim:       897224 kB
    KernelStack:       22320 kB
    PageTables:        60920 kB
    SecPageTables:         0 kB
    NFS_Unstable:          0 kB
    Bounce:                0 kB
    WritebackTmp:          0 kB
    CommitLimit:     9189496 kB
    Committed_AS:   11361760 kB
    VmallocTotal:   34359738367 kB
    VmallocUsed:       67168 kB
    VmallocChunk:          0 kB
    Percpu:             7296 kB
    HardwareCorrupted:     0 kB
    AnonHugePages:    743424 kB
    ShmemHugePages:        0 kB
    ShmemPmdMapped:        0 kB
    FileHugePages:         0 kB
    FilePmdMapped:         0 kB
    CmaTotal:              0 kB
    CmaFree:               0 kB
    Unaccepted:            0 kB
    HugePages_Total:       0
    HugePages_Free:        0
    HugePages_Rsvd:        0
    HugePages_Surp:        0
    Hugepagesize:       2048 kB
    Hugetlb:               0 kB
    DirectMap4k:    13922424 kB
    DirectMap2M:      632832 kB
    DirectMap1G:           0 kB

Edit2:
For what it’s worth also the top few lines of a raw sudo ps aux --sort -rss: 

    USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
    mike     20562  0.1  4.9 6180236 704660 ?      Sl   12:14   0:33 /usr/bin/plasmashell
    mike     16053  1.9  1.3 2411024 190684 ?      Sl   Apr16 418:04 /usr/bin/kwin_wayland --wayland-fd 7 --socket wayland-0 --xwayland-fd 8 --xwayland-fd 9 --xwayland-display :2 --xwayland-xauthority /run/user/1000/xauth_CCFtQW --xwayland
    mike     17485  0.0  0.8 3702672 117056 ?      Ssl  Apr16   0:51 /usr/bin/krunner
    mike     18262  0.0  0.7 1608452 112084 ?      Sl   Apr16  16:29 /usr/bin/konsole
    mike     16153  0.0  0.2 2224152 38100 ?       Ssl  Apr16   4:07 /usr/bin/kded5
    mike     18307  0.0  0.1  86288 28288 pts/1    S+   Apr16   4:13 weechat
    mike      3300  0.0  0.1 190896 27528 ?        Sl   15:38   0:06 /usr/lib/kf5/kio_http_cache_cleaner
    mike     18440  0.0  0.1 3120756 27224 ?       Sl   Apr16   0:43 /usr/bin/akonadiserver
    root     25945  0.0  0.1  28580 25352 pts/4    S+   13:18   0:01 /bin/bash
    mike     16237  0.0  0.1 1633312 22884 ?       Ssl  Apr16   0:13 /usr/lib/xdg-desktop-portal-kde
                                

Michael (190 rep)

May 1, 2025, 10:43 AM • Last activity: May 1, 2025, 04:38 PM

1 votes

2 answers

1382 views

Why is mariadb.service not restarted by systemd after OOM kill

debian systemd out-of-memory mariadb

The mysql.service got killed by the OOM killer. While investigating the root cause I wanted to change the unit configuration to restart if killed. I was surprised to find ``` Restart=on-abort ``` already in the default unit configuration file. Reading https://www.freedesktop.org/software/systemd/man...

The mysql.service got killed by the OOM killer. While investigating the root cause I wanted to change the unit configuration to restart if killed. I was surprised to find

Restart=on-abort

already in the default unit configuration file. Reading https://www.freedesktop.org/software/systemd/man/latest/systemd.service.html#Restart= I think when getting killed by OOM systemd should restart the service. Testing on a non production server with kill -9 pid shows the expected behavior, the service is automatically restarted. A simple systemctl restart mysql.service did work, so there should be nothing preventing systemd from bringing the service back up. So why is my service not restarted ? *Edit to anwser request in the comments* - The System is Debian 12 - mysql just symlinks to mariadb . But it is mariadb. - The unit file is the default delivered by the maintainers

# It's not recommended to modify this file in-place, because it will be
# overwritten during package upgrades.  If you want to customize, the
# best way is to create a file "/etc/systemd/system/mariadb.service",
# containing
#	.include /usr/lib/systemd/system/mariadb.service
#	...make your changes here...
# or create a file "/etc/systemd/system/mariadb.service.d/foo.conf",
# which doesn't need to include ".include" call and which will be parsed
# after the file mariadb.service itself is parsed.
#
# For more info about custom unit files, see systemd.unit(5) or
# https://mariadb.com/kb/en/mariadb/systemd/ 
#
# Copyright notice:
#
# This file is free software; you can redistribute it and/or modify it
# under the terms of the GNU Lesser General Public License as published by
# the Free Software Foundation; either version 2.1 of the License, or
# (at your option) any later version.

[Unit]
Description=MariaDB 10.11.6 database server
Documentation=man:mariadbd(8)
Documentation=https://mariadb.com/kb/en/library/systemd/ 
After=network.target

[Install]
WantedBy=multi-user.target


[Service]

##############################################################################
## Core requirements
##

Type=notify

# Setting this to true can break replication and the Type=notify settings
# See also bind-address mariadbd option.
PrivateNetwork=false

##############################################################################
## Package maintainers
##

User=mysql
Group=mysql

# CAP_IPC_LOCK To allow memlock to be used as non-root user
# CAP_DAC_OVERRIDE To allow auth_pam_tool (which is SUID root) to read /etc/shadow when it's chmod 0
#   does nothing for non-root, not needed if /etc/shadow is u+r
# CAP_AUDIT_WRITE auth_pam_tool needs it on Debian for whatever reason
CapabilityBoundingSet=CAP_IPC_LOCK CAP_DAC_OVERRIDE CAP_AUDIT_WRITE

# PrivateDevices=true implies NoNewPrivileges=true and
# SUID auth_pam_tool suddenly doesn't do setuid anymore
PrivateDevices=false

# Prevent writes to /usr, /boot, and /etc
ProtectSystem=full



# Doesn't yet work properly with SELinux enabled
# NoNewPrivileges=true

# Prevent accessing /home, /root and /run/user
ProtectHome=true

# Execute pre and post scripts as root, otherwise it does it as User=
PermissionsStartOnly=true

ExecStartPre=/usr/bin/install -m 755 -o mysql -g root -d /var/run/mysqld

# Perform automatic wsrep recovery. When server is started without wsrep,
# galera_recovery simply returns an empty string. In any case, however,
# the script is not expected to return with a non-zero status.
# It is always safe to unset _WSREP_START_POSITION environment variable.
# Do not panic if galera_recovery script is not available. (MDEV-10538)
ExecStartPre=/bin/sh -c "systemctl unset-environment _WSREP_START_POSITION"
ExecStartPre=/bin/sh -c "[ ! -e /usr/bin/galera_recovery ] && VAR= || \
 VAR=cd /usr/bin/..; /usr/bin/galera_recovery; [ $? -eq 0 ] \
 && systemctl set-environment _WSREP_START_POSITION=$VAR || exit 1"

# Needed to create system tables etc.
# ExecStartPre=/usr/bin/mysql_install_db -u mysql

# Start main service
# MYSQLD_OPTS here is for users to set in /etc/systemd/system/mariadb.service.d/MY_SPECIAL.conf
# Use the [Service] section and Environment="MYSQLD_OPTS=...".
# This isn't a replacement for my.cnf.
# _WSREP_NEW_CLUSTER is for the exclusive use of the script galera_new_cluster
ExecStart=/usr/sbin/mariadbd $MYSQLD_OPTS $_WSREP_NEW_CLUSTER $_WSREP_START_POSITION

# Unset _WSREP_START_POSITION environment variable.
ExecStartPost=/bin/sh -c "systemctl unset-environment _WSREP_START_POSITION"

ExecStartPost=/etc/mysql/debian-start

KillSignal=SIGTERM

# Don't want to see an automated SIGKILL ever
SendSIGKILL=no

# Restart crashed server only, on-failure would also restart, for example, when
# my.cnf contains unknown option
Restart=on-abort
RestartSec=5s

UMask=007

##############################################################################
## USERs can override
##
##
## by creating a file in /etc/systemd/system/mariadb.service.d/MY_SPECIAL.conf
## and adding/setting the following under [Service] will override this file's
## settings.

# Useful options not previously available in [mysqld_safe]

# Kernels like killing mariadbd when out of memory because its big.
# Lets temper that preference a little.
# OOMScoreAdjust=-600

# Explicitly start with high IO priority
# BlockIOWeight=1000

# If you don't use the /tmp directory for SELECT ... OUTFILE and
# LOAD DATA INFILE you can enable PrivateTmp=true for a little more security.
PrivateTmp=false

# Set an explicit Start and Stop timeout of 900 seconds (15 minutes!)
# this is the same value as used in SysV init scripts in the past
# Galera might need a longer timeout, check the KB if you want to change this:
# https://mariadb.com/kb/en/library/systemd/#configuring-the-systemd-service-timeout 
TimeoutStartSec=900
TimeoutStopSec=900

##
## Options previously available to be set via [mysqld_safe]
## that now needs to be set by systemd config files as mysqld_safe
## isn't executed.
##

# Number of files limit. previously [mysqld_safe] open-files-limit
LimitNOFILE=32768
# For liburing and io_uring_setup()
LimitMEMLOCK=524288
# Maximium core size. previously [mysqld_safe] core-file-size
# LimitCore=

# Nice priority. previously [mysqld_safe] nice
# Nice=-5

# Timezone. previously [mysqld_safe] timezone
# Environment="TZ=UTC"

# Library substitutions. previously [mysqld_safe] malloc-lib with explicit paths
# (in LD_LIBRARY_PATH) and library name (in LD_PRELOAD).
# Environment="LD_LIBRARY_PATH=/path1 /path2" "LD_PRELOAD=

# Flush caches. previously [mysqld_safe] flush-caches=1
# ExecStartPre=sync
# ExecStartPre=sysctl -q -w vm.drop_caches=3

# numa-interleave=1 equalivant
# Change ExecStart=numactl --interleave=all /usr/sbin/mariadbd......

# crash-script equalivent
# FailureAction=

bluefish (191 rep)

Jun 3, 2024, 08:48 PM • Last activity: Apr 25, 2025, 06:00 AM

3 votes

0 answers

41 views

Implement a recovery virtual console for a hanged system

cgroups out-of-memory nice

This might be a duplicate of "[reserve memory for a set of processes](https://unix.stackexchange.com/questions/401769/reserve-memory-for-a-set-of-processes)", but I think my question is a little broader. I have a system that likes to hang a lot. I tend to use a lot of browser tabs and a bunch of Ele...

                                  This might be a duplicate of "[reserve memory for a set of processes](https://unix.stackexchange.com/questions/401769/reserve-memory-for-a-set-of-processes) ", but I think my question is a little broader.

I have a system that likes to hang a lot. I tend to use a lot of browser tabs and a bunch of Electron apps; sometimes when too many of these are open my system comes to a complete stop. Usually my solution is to just Alt+SysRq+F to forcefully invoke oom_kill, this usually results in my browser getting killed.

I have been meaning to install a user-space service killer for a while so that I get a more proactive prevention of a system freeze, but honestly, it wouldn't change much (unless I was doing something time sensitive and couldn't afford a 2.5s before oom_kill is complete). I would much rather the ability to choose what I want to be killed.

To that end, I would like to have a "holy virtual terminal", one that I can simply Ctrl+Alt+F1 (my graphical session lives in Ctrl+Alt+F2) and then use either top or btop to kill whatever I want to restore my system to a working state. I want a guarantee that it _will have enough memory/cpu priority to function_. Right now if the hang is bad enough, it either takes minutes for a login prompt to show up on virtual consoles, or the login itself times out after 60 seconds.

Is this possible? How? Would "niceness" be relevant? Perhaps there is a kernel module for it? 

I wouldn't mind permanently losing 1GB or less of memory in my system if that was necessary (if it needed to be "allocated" to such a console).

Previously I thought one of the solutions would be to create a default cgroup that would only allow up to System Ram - 1Gb, then I could somehow have tty1 be on a different cgroup which wouldn't have that memory limit. I don't really know how that would be achieved, but it sounds possible. However, there seems kernel parameter that was designed specifically for this purpose:

[vm.admin_reserve_kbytes](https://www.kernel.org/doc/Documentation/sysctl/vm.txt) 
> The amount of free memory in the system that should be reserved for users
with the capability cap_sys_admin.
>
> admin_reserve_kbytes defaults to min(3% of free pages, 8MB)
>
> That should provide enough for the admin to log in and kill a process,
if necessary, under the default overcommit 'guess' mode.

My system is already in 'guess' mode, but if I can't even get a login prompt, I don't really see the point of reserving for root if you don't have enough memory to login as root. It sounds like some combinations of these parameters would allow me to get what I want, but it isn't clear to me right now what that would be.
                                

Mathias Sven (273 rep)

Apr 15, 2025, 05:27 PM • Last activity: Apr 16, 2025, 01:06 PM

1 votes

0 answers

61 views

Weird behavior: Does libasan consume system memory constantly in Linux?

linux memory gcc out-of-memory

I am working on an embedded Linux system (kernel-5.10), and the cross GCC only supports `-fsanitize=address` for address sanitizer. Then I built a testing program with `-fsanitize=address` and `-lasan`. When I started the program as a background, I checked the system memory usage with `free`. To my...

I am working on an embedded Linux system (kernel-5.10), and the cross GCC only supports -fsanitize=address for address sanitizer. Then I built a testing program with -fsanitize=address and -lasan. When I started the program as a background, I checked the system memory usage with free. To my surprise, the available statistics of the free command is descreasing constantly! The program is killed by oom killer after about 1 hour. And there is NO big memory leakage reported by ASAN! If I built the same program without ASAN, there is NO such memory constant consumption from the program. So, why enabling ASAN caused this dramatic memory consumption ? Is it a program bug or ASAN's bug? ##### Updated with MRE per Bodo's comment.

#include 
#include 
#include 
#include 

int main(int argc, char **argv)
{
    void *buf = NULL;
    int size = 0;
    int x = 0;

    if (argc != 2) {
        printf("Usage: %s memsize\n", argv);
        return 1;
    }

    size = atoi(argv);
    if (size == 0) {
        size = 4096 * 1024;
    }

    while (1) {
        x++;
        buf = malloc(size);

        if (buf) {
            memset(buf, x, size/sizeof(x));
            free(buf);
        }

        usleep(10000);
    }

    return 0;
}

It is compiled with -fsanitize=address, and it is running in backgroud (/tmp/memcheck 4096), and I checked the system memory usage in frontground, here is what I got.

# while [ 1 ]; do ps -o pid,comm,vsz,rss | grep memcheck;free -k; echo =================; sleep 2; done

 1823 memcheck         422m 6536
              total        used        free      shared  buff/cache   available
Mem:         115036       41640       67740         480        5656       68308
Swap:             0           0           0
=================
 1823 memcheck         423m 7668
              total        used        free      shared  buff/cache   available
Mem:         115036       42760       66620         480        5656       67188
Swap:             0           0           0
=================
 1823 memcheck         424m 8812
              total        used        free      shared  buff/cache   available
Mem:         115036       43900       65480         480        5656       66048
Swap:             0           0           0
=================
 1823 memcheck         425m 9952
              total        used        free      shared  buff/cache   available
Mem:         115036       45076       64304         480        5656       64872
Swap:             0           0           0
=================
 1823 memcheck         426m  10m
              total        used        free      shared  buff/cache   available
Mem:         115036       46196       63184         480        5656       63752
Swap:             0           0           0
=================
 1823 memcheck         427m  11m
              total        used        free      shared  buff/cache   available
Mem:         115036       47360       62020         480        5656       62588
Swap:             0           0           0
=================
 1823 memcheck         428m  13m
              total        used        free      shared  buff/cache   available
Mem:         115036       48488       60892         480        5656       61460
Swap:             0           0           0
=================
 1823 memcheck         429m  14m
              total        used        free      shared  buff/cache   available
Mem:         115036       49636       59744         480        5656       60312
Swap:             0           0           0
=================
 1823 memcheck         430m  15m
              total        used        free      shared  buff/cache   available
Mem:         115036       50676       58700         480        5660       59268
Swap:             0           0           0
=================
 1823 memcheck         431m  16m
              total        used        free      shared  buff/cache   available
Mem:         115036       51796       57580         480        5660       58148
Swap:             0           0           0
=================
 1823 memcheck         432m  17m
              total        used        free      shared  buff/cache   available
Mem:         115036       52920       56456         480        5660       57024
Swap:             0           0           0
=================
 1823 memcheck         433m  18m
              total        used        free      shared  buff/cache   available
Mem:         115036       54152       55224         480        5660       55792
Swap:             0           0           0
=================
 1823 memcheck         434m  19m
              total        used        free      shared  buff/cache   available
Mem:         115036       55292       54084         480        5660       54652
Swap:             0           0           0
=================
 1823 memcheck         435m  20m
              total        used        free      shared  buff/cache   available
Mem:         115036       56448       52928         480        5660       53496
Swap:             0           0           0
=================

I can get the similar result when running the MRE in target board and in PC's WSL (Ubuntu-20.04). The memory usage of system and memcheck is increasing constantly. Is there any way to control the memory used by ASAN? Thanks,

wangt13 (631 rep)

Jan 16, 2025, 12:33 PM • Last activity: Jan 20, 2025, 11:32 PM

0 votes

0 answers

14 views

How to solve Out of memory on AKS?

memory out-of-memory

I am running CircleCI jobs and got an 137 error. I checked k logs pod/dzapi-66d6b9cb-l995w -n simm Error from server (BadRequest): container "dzapi-solution" in pod "dzapi-solution-66d6b9cb-l995w" is waiting to start: trying and failing to pull image It seemed like auth issue,but... dmesg shows [ 42...

                                  I am running CircleCI jobs and got an 137 error.

I checked

    k logs pod/dzapi-66d6b9cb-l995w -n simm
    Error from server (BadRequest): container "dzapi-solution" in pod "dzapi-solution-66d6b9cb-l995w" is waiting to start: trying and failing to pull image

It seemed like auth issue,but...
dmesg shows
[

    4206410.396011] Out of memory: Killed process 836854 (python3) total-vm:91360kB, anon-rss:70000kB, file-rss:2768kB, shmem-rss:0kB, UID:0 pgtables:224kB oom_score_adj:0

and 

    sar -r 1 5
    Linux 5.15.0-1075-azure (stg)        01/20/25        _x86_64_        (1 CPU)
    
    21:15:19    kbmemfree   kbavail kbmemused  %memused kbbuffers  kbcached  kbcommit   %commit  kbactive   kbinact   kbdirty
    21:15:20        17124    127484    185068     46.16     12236    105668    514732    128.39     57860    168344       392
    21:15:21        17124    127484    185068     46.16     12236    105668    514732    128.39     57860    168344       392
    21:15:22        17124    127484    185068     46.16     12244    105660    514732    128.39     57860    168344       392
    21:15:23        17124    127492    185060     46.16     12244    105668    514732    128.39     57868    168364       320
    21:15:24        17124    127492    185060     46.16     12244    105668    514732    128.39     57868    168364       320
    Average:        17124    127487    185065     46.16     12241    105666    514732    128.39     57863    168352       363

And ps

    ps aux --sort -pmem
    USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
    root        1072  0.1  9.5 399884 38152 ?        Sl    2024  73:10 python3 -u bin/WALinuxAgent-2.12.0.2-py3.9.egg -run-exthandlers
    root         374  0.0  7.6 297928 30860 ?        S
                              
                            

MikiBelavista (1755 rep)

Jan 20, 2025, 09:31 PM

0 votes

0 answers

47 views

How to setup memory cgroup with OOM killer enabled?

cgroups out-of-memory

I did the following: mkdir /sys/fs/cgroup/memory/test echo 32212254720 > /sys/fs/cgroup/memory/test/memory.limit_in_bytes cgexec -g memory:test … But when the cgroup's memory is almost fully used, then processes start to degrade but no OOM happens: ❯ cat memory.limit_in_bytes 32212254720 ❯ cat memor...

I did the following: mkdir /sys/fs/cgroup/memory/test echo 32212254720 > /sys/fs/cgroup/memory/test/memory.limit_in_bytes cgexec -g memory:test … But when the cgroup's memory is almost fully used, then processes start to degrade but no OOM happens: ❯ cat memory.limit_in_bytes 32212254720 ❯ cat memory.usage_in_bytes 32197885952 ❯ cat memory.oom_control oom_kill_disable 0 under_oom 0 oom_kill 0 In htop all the processes are shown in uninterruptable state (D sign). How to properly setup cgroup to let the OOM killer do it's job and kill processes?

abyss.7 (189 rep)

Dec 23, 2024, 11:25 AM • Last activity: Dec 23, 2024, 06:05 PM

0 votes

0 answers

36 views

Solaris 10 g++ virtual memory exhausted

solaris gcc virtual-memory out-of-memory g++

I am trying to use the cm3-unix64le-d5.11.1-20210610 from the [Modula 3 github][1]. i am running the .cpp file and I get a virtual memory exhausted: Not enough space error. I have 16GB of RAM and I'm running Solaris 10 u11 on an amd64 cpu. My swap partition has 2097144 free, out of 2097144. I'm usin...

I am trying to use the cm3-unix64le-d5.11.1-20210610 from the Modula 3 github . i am running the .cpp file and I get a virtual memory exhausted: Not enough space error. I have 16GB of RAM and I'm running Solaris 10 u11 on an amd64 cpu. My swap partition has 2097144 free, out of 2097144. I'm using gcc5g++ from opencsw. the c++ file is only about 300 mb. Here is the terminal output. the warnings don't seem to be problematic. # g++ -g -pthread -c /Desktop/cm3-boot-unix64le-d5.11.1-20210610.cpp [01m[K../src/types/ArrayType.m3:[m[K In function '[01m[KINTEGER ArrayType__InitCoster(ArrayType__P, BOOLEAN)[m[K': [01m[K../src/types/ArrayType.m3:429:16:[m[K [01;35m[Kwarning: [m[Koverflow in implicit constant conversion [-Woverflow] [01m[K../src/types/ArrayType.i3:134:20:[m[K [01;36m[Knote: [m[Kin definition of macro '[01m[KINT64_[m[K' [01m[K../src/types/ArrayType.m3:433:16:[m[K [01;35m[Kwarning: [m[Koverflow in implicit constant conversion [-Woverflow] [01m[K../src/types/ArrayType.i3:134:20:[m[K [01;36m[Knote: [m[Kin definition of macro '[01m[KINT64_[m[K' [01m[K../src/types/ArrayType.m3:434:16:[m[K [01;35m[Kwarning: [m[Koverflow in implicit constant conversion [-Woverflow] [01m[K../src/types/ArrayType.i3:134:20:[m[K [01;36m[Knote: [m[Kin definition of macro '[01m[KINT64_[m[K' [01m[K../src/float/IEEE/LongFloat.m3:[m[K In function '[01m[KINTEGER LongFloat__ILogb(LongFloat__T)[m[K': [01m[K../src/float/IEEE/LongFloat.m3:53:16:[m[K [01;35m[Kwarning: [m[Koverflow in implicit constant conversion [-Woverflow] [01m[K../src/float/Common/Float.ig:257:20:[m[K [01;36m[Knote: [m[Kin definition of macro '[01m[KINT64_[m[K' [01m[K../src/float/IEEE/LongFloat.m3:55:38:[m[K [01;35m[Kwarning: [m[Koverflow in implicit constant conversion [-Woverflow] [01m[K../src/MasmObjFile.m3:[m[K In function '[01m[KINTEGER MasmObjFile__NextSymOffset(MasmObjFile__DState*, MasmObjFile__SKind)[m[K': [01m[K../src/MasmObjFile.m3:750:16:[m[K [01;35m[Kwarning: [m[Koverflow in implicit constant conversion [-Woverflow] [01m[K../src/MasmObjFile.i3:92:20:[m[K [01;36m[Knote: [m[Kin definition of macro '[01m[KINT64_[m[K' [01m[K../src/MasmObjFile.m3:[m[K In function '[01m[KINTEGER MasmObjFile__NextRelocOffset(MasmObjFile__DState*, MasmObjFile__SKind)[m[K': [01m[K../src/MasmObjFile.m3:762:16:[m[K [01;35m[Kwarning: [m[Koverflow in implicit constant conversion [-Woverflow] [01m[K../src/MasmObjFile.i3:92:20:[m[K [01;36m[Knote: [m[Kin definition of macro '[01m[KINT64_[m[K' [01m[K../src/types/OpenArrayType.m3:[m[K In function '[01m[KINTEGER OpenArrayType__InitCoster(OpenArrayType__P, BOOLEAN)[m[K': [01m[K../src/types/OpenArrayType.m3:282:16:[m[K [01;35m[Kwarning: [m[Koverflow in implicit constant conversion [-Woverflow] [01m[K../src/types/OpenArrayType.i3:122:20:[m[K [01;36m[Knote: [m[Kin definition of macro '[01m[KINT64_[m[K' [01m[K../src/float/IEEE/RealFloat.m3:[m[K In function '[01m[KINTEGER RealFloat__ILogb(RealFloat__T)[m[K': [01m[K../src/float/IEEE/RealFloat.m3:49:16:[m[K [01;35m[Kwarning: [m[Koverflow in implicit constant conversion [-Woverflow] [01m[K../src/float/Common/Float.ig:257:20:[m[K [01;36m[Knote: [m[Kin definition of macro '[01m[KINT64_[m[K' [01m[K../src/float/IEEE/RealFloat.m3:51:38:[m[K [01;35m[Kwarning: [m[Koverflow in implicit constant conversion [-Woverflow] [01m[K../src/types/RecordType.m3:[m[K In function '[01m[KINTEGER RecordType__InitCoster(RecordType__P, BOOLEAN)[m[K': [01m[K../src/types/RecordType.m3:390:16:[m[K [01;35m[Kwarning: [m[Koverflow in implicit constant conversion [-Woverflow] [01m[K../src/types/RecordType.i3:99:20:[m[K [01;36m[Knote: [m[Kin definition of macro '[01m[KINT64_[m[K' [01m[K../src/misc/Scanner.m3:[m[K In function '[01m[KINTEGER Scanner__HexDigitValue(m3_CHAR)[m[K': [01m[K../src/misc/Scanner.m3:509:38:[m[K [01;35m[Kwarning: [m[Koverflow in implicit constant conversion [-Woverflow] virtual memory exhausted: Not enough space

alex miranda (1 rep)

Dec 16, 2024, 03:15 PM

1 votes

0 answers

112 views

How to make the OOM killer target processes with high nice values?

linux out-of-memory

I would like to know how to configure the OOM killer to first kill processes that have a high nice value. My usecase for this is that I have some background processes where I don't mind them getting killed, but I'd like to preserve my main programs. I have read https://unix.stackexchange.com/questio...

I would like to know how to configure the OOM killer to first kill processes that have a high nice value. My usecase for this is that I have some background processes where I don't mind them getting killed, but I'd like to preserve my main programs. I have read https://unix.stackexchange.com/questions/153585/how-does-the-oom-killer-decide-which-process-to-kill-first , and as far as I understand the OOM killer considers three factors: 1. how much memory is hypothetically available to a process 2. what percentage of that limit is the process currently using 3. the process' oom_score_adj So I guess the real question is: how can I set oom_score_adj automatically based on the nice value of the process?

FliegendeWurst (235 rep)

Nov 27, 2024, 02:36 PM

2 votes

2 answers

245 views

Linux: Out of memory and a lot of memory neither used by processes nor available

linux out-of-memory

in our production systems we are using Linux (kernel 5.10.55-051055-generic) on Intel NUCs (4GB RAM, 2GB swap) together with our Services run in Docker container. Those services are mainly small services communicating with zmq (over tcp) but also a service running CNNs using OpenVino on the integrat...

in our production systems we are using Linux (kernel 5.10.55-051055-generic) on Intel NUCs (4GB RAM, 2GB swap) together with our Services run in Docker container. Those services are mainly small services communicating with zmq (over tcp) but also a service running CNNs using OpenVino on the integrated Intel GPU. Over time (systems where we checked memory were running about 30 days) we somehow "lose" memory which means that a lot of memory (over 1 GB) is neither shown to be used by a process nor available. So the question is, what is using this memory or why is it not available? What did we check: * docker stats memory usage (as we are running only docker container in an Ubuntu command line environment): However those values are not very useful for investigating this problem) * docker container memory usage shown in /sys/fs/cgroup/memory/docker//memory.usage_in_bytes and ...memory.stats: There is a large gap between memory used by the docker container and available memory * Log output of oom killer: Summing up the memory usage per process in the output sums up to only about 2.5GB, which means more than 1GB is somehow lost * Data from /proc//smaps (calculating ourself and using procrank ): Sums up to only 2 to 2.5 GB when only a few hundred MB are available (again missing more than 1GB) * pidstat -trl: again missing more than 1GB * echo m > /proc/sysrq-trigger: again missing more than 1GB Output of free command and procrank summary (created within 2 seconds):
Pss Uss 1880676K 1855268K TOTAL RAM: 3672156K total, 136388K free, 17756K buffers, 417332K cached, 249268K shmem, 229920K slab total used free shared buff/cache available Mem: 3672156 3002328 138476 249268 531352 175376 Swap: 1951740 1951740 0
Output of echo m > /proc/sysrq-trigger:
[2948794.936393] sysrq: Show Memory [2948794.936404] Mem-Info: [2948794.936412] active_anon:196971 inactive_anon:206372 isolated_anon:0 active_file:109642 inactive_file:83546 isolated_file:0 unevictable:36 dirty:369 writeback:0 slab_reclaimable:27505 slab_unreclaimable:37597 mapped:84417 shmem:222 pagetables:6893 bounce:0 free:61015 free_pcp:1514 free_cma:0 [2948794.936417] Node 0 active_anon:787884kB inactive_anon:825488kB active_file:438568kB inactive_file:334184kB unevictable:144kB isolated(anon):0kB isolated(file):0kB mapped:337668kB dirty:1476kB writeback:0kB shmem:888kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB kernel_stack:22176kB all_unreclaimable? no [2948794.936419] Node 0 DMA free:14340kB min:296kB low:368kB high:440kB reserved_highatomic:0KB active_anon:68kB inactive_anon:144kB active_file:68kB inactive_file:140kB unevictable:0kB writepending:0kB present:15992kB managed:15904kB mlocked:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB [2948794.936425] lowmem_reserve[]: 0 772 3487 3487 3487 [2948794.936431] Node 0 DMA32 free:96236kB min:14912kB low:18640kB high:22368kB reserved_highatomic:0KB active_anon:134108kB inactive_anon:163616kB active_file:40544kB inactive_file:80604kB unevictable:0kB writepending:20kB present:902612kB managed:837040kB mlocked:0kB pagetables:4820kB bounce:0kB free_pcp:1896kB local_pcp:444kB free_cma:0kB [2948794.936439] lowmem_reserve[]: 0 0 2714 2714 2714 [2948794.936448] Node 0 Normal free:133484kB min:52368kB low:65460kB high:78552kB reserved_highatomic:2048KB active_anon:653708kB inactive_anon:661728kB active_file:397956kB inactive_file:253440kB unevictable:144kB writepending:1456kB present:2891776kB managed:2786464kB mlocked:48kB pagetables:22752kB bounce:0kB free_pcp:4160kB local_pcp:856kB free_cma:0kB [2948794.936455] lowmem_reserve[]: 0 0 0 0 0 [2948794.936460] Node 0 DMA: 7*4kB (UME) 3*8kB (UE) 15*16kB (UME) 7*32kB (ME) 4*64kB (UM) 4*128kB (UME) 3*256kB (UME) 2*512kB (ME) 1*1024kB (E) 1*2048kB (E) 2*4096kB (M) = 14340kB [2948794.936482] Node 0 DMA32: 3295*4kB (UME) 4936*8kB (UME) 1099*16kB (UME) 154*32kB (UME) 53*64kB (UME) 16*128kB (ME) 9*256kB (ME) 6*512kB (UM) 6*1024kB (ME) 2*2048kB (M) 0*4096kB = 96236kB [2948794.936505] Node 0 Normal: 3621*4kB (MEH) 1357*8kB (MEH) 2857*16kB (UMEH) 693*32kB (UMEH) 259*64kB (UMEH) 41*128kB (ME) 22*256kB (ME) 11*512kB (ME) 7*1024kB (M) 0*2048kB 0*4096kB = 133484kB [2948794.936526] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [2948794.936528] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [2948794.936529] 184470 total pagecache pages [2948794.936532] 10069 pages in swap cache [2948794.936534] Swap cache stats: add 1891574, delete 1881493, find 2324524/2691454 [2948794.936535] Free swap = 832060kB [2948794.936536] Total swap = 1951740kB [2948794.936537] 952595 pages RAM [2948794.936539] 0 pages HighMem/MovableOnly [2948794.936541] 42743 pages reserved [2948794.936542] 0 pages hwpoisoned
output of df -h | grep -e tmpfs -e Filesystem
Filesystem Size Used Avail Use% Mounted on tmpfs 359M 4.2M 355M 2% /run tmpfs 1.8G 0 1.8G 0% /dev/shm tmpfs 5.0M 0 5.0M 0% /run/lock tmpfs 1.8G 0 1.8G 0% /sys/fs/cgroup tmpfs 359M 0 359M 0% /run/user/0 tmpfs 359M 0 359M 0% /run/user/1000
Edit: htop output (only parent processes, sorted by mem%)

ts_arivo (31 rep)

Nov 28, 2023, 02:05 PM • Last activity: Sep 24, 2024, 12:58 PM

0 votes

2 answers

263 views

Why is Linux not using RAM but only Swap?

memory swap out-of-memory proxmox free

How can such an output of `free -m` be explained? ``` total used free shared buff/cache available Mem: 32036 1012 225 3 8400 31024 Swap: 32767 24138 8629 ``` I understand having `free` memory to be low is no sign of alarm as Linux uses unused memory for buffers and file system caches (`buff/cache`)....

How can such an output of free -m be explained?
total used free shared buff/cache available Mem: 32036 1012 225 3 8400 31024 Swap: 32767 24138 8629
I understand having free memory to be low is no sign of alarm as Linux uses unused memory for buffers and file system caches (buff/cache). What's important to have enough available memory. But why is the kernel not swapping in again? Nearly all memory is available. I took this output from a continuously log-to-disk I setup as "every minute" cronjob. At that point in time the system was so unresponsive I could not even locally login anymore. After slowly typing username and password, there was a timeout (Login timed out after 60 seconds.), so I could not reach a shell and had to power-cycle the server to recover. The journal is full of take too long, timeout and broken pipe messages as everything on the system is crawling and therefore malfunctioning. I played around with vm.swappiness, having the default value of 60 reduced to 10 (to put the kernel more onto "only swap if it's really necessary"), but I have similar results. I was hesitant to try a swapoff && swapon to bring the available memory back into play. Does the oom-killer take over if not everything fits into RAM? Or does the system crash then? --- A little more background information about the concrete case:\ I have a Proxmox setup, evaluating how stable everything runs. I really stress the machine having allocated more RAM to the VMs in total than I have. To my unterstanding, this should still work with paying a little price of using swap space, slowing things down.\ I noticed that everything works stabile as I expected. I play around suspending VMs to disk, then starting other VMs. Swap gets used if needed and when VMs are suspended, Swap is being freed again. But lately I added backup into my evaluation and this really crashes the machine. Over night, when PVE Backup is started RAM gets more and more available by consuming Swap. Backup speed falls from "1% per few seconds" to "1% per several hours" and eventually no progress at all. The machine gets unresponsive with that memory picture. The VMs are still running, but also their applications are malfunctioning as their system gets errors like interrupt took 2.2s, Watchdog timeout (limit 3min)!, CPU stuck for 23s!. In the morning I find myself an unresponsive host.

theHacker (181 rep)

May 16, 2024, 08:04 PM • Last activity: Sep 20, 2024, 03:26 PM

0 votes

1 answers

82 views

How to use systemd-run to isolate the rest of the system from a rogue program triggering the oom killer

cgroups out-of-memory systemd-run

I'm wanting to use cgroups and systemd-run to insulate the rest of my system from rogue programs that wake the OOM killer. In particular, clangd is hogging all my memory and then some, and then triggering the OOM Killer. That's a separate problem and a separate question. (Although any answers welcom...

I'm wanting to use cgroups and systemd-run to insulate the rest of my system from rogue programs that wake the OOM killer. In particular, clangd is hogging all my memory and then some, and then triggering the OOM Killer. That's a separate problem and a separate question. (Although any answers welcome) This question is about why my usage of systemd-run isn't working. If I wrap it like so...
-run --user --scope -p MemoryHigh=3G clangd --clang-tidy --malloc-trim --log=info --background-index -j 8 --pch-storage=disk --background-index-priority=low
It doesn't stop it from going hog wild and the OOM killer killing my gnome session and making me log in again! I have tried MemoryMax=5G and it doesn't make a difference. Details of my setup are.... * Ubuntu Noble 24.04 LTS * clangd 18.1.3 * MemTotal: 16101600 kB * SwapTotal: 5242876 kB * Linux version 6.8.0-41-generic (buildd@lcy02-amd64-100) (x86_64-linux-gnu-gcc-13 (Ubuntu 13.2.0-23ubuntu4) 13.2.0, GNU ld (GNU Binutils for Ubuntu) 2.42) #41-Ubuntu SMP PREEMPT_DYNAMIC Fri Aug 2 20:41:06 UTC 202
systemd-run --version systemd 255 (255.4-1ubuntu8.4) +PAM +AUDIT +SELINUX +APPARMOR +IMA +SMACK +SECCOMP +GCRYPT -GNUTLS +OPENSSL +ACL +BLKID +CURL +ELFUTILS +FIDO2 +IDN2 -IDN +IPTC +KMOD +LIBCRYPTSETUP +LIBFDISK +PCRE2 -PWQUALITY +P11KIT +QRENCODE +TPM2 +BZIP2 +LZ4 +XZ +ZLIB +ZSTD -BPF_FRAMEWORK -XKBCOMMON +UTMP +SYSVINIT default-hierarchy=unified
References I'm using... * https://www.freedesktop.org/software/systemd/man/latest/systemd.resource-control.html#Memory%20Accounting%20and%20Control * https://www.freedesktop.org/software/systemd/man/latest/systemd-run.html#

John Carter (111 rep)

Sep 5, 2024, 05:57 AM • Last activity: Sep 6, 2024, 04:59 AM

7 votes

1 answers

898 views

grep command fails with out-of-memory error

linux grep out-of-memory

I encountered an OOM issue (this happens every-time I execute it) on running grep -Fxvf file1 file2 file1 size: ~200MB\ file2 size: ~300MB\ number of records in each file: ~300K\ avg records length: ~1K (only ASCII characters)\ diff between two files is ~18K records Available free memory: ~16GB I tr...

I encountered an OOM issue (this happens every-time I execute it) on running grep -Fxvf file1 file2 file1 size: ~200MB\ file2 size: ~300MB\ number of records in each file: ~300K\ avg records length: ~1K (only ASCII characters)\ diff between two files is ~18K records Available free memory: ~16GB I tried with several different grep versions and in VM, WSL and also on a physical server, but got the same result. Note that I ran the same command with only having few lines from both files to identify it doesn’t run into an infinite loop due to having some special character in the files and it was successful. Is this normal? I’m trying to output records from file2 which do not exist in file1. I already resolved my requirement by awk in the same environment and got the output in less than about 10 seconds, but I’m wondering why the grep results in OOM. I used the same command almost always when I needed to do querying same requirements and even I had compared two very big files like two files with ~2GB of size and each one ~90M records and records containing maximum ~20 ASCII characters without any issue in the same boxes. I used GNU grep 2.7, 2.16 on SLES12, and GNU grep 3.7 on Ubnutu 22.04 in WSL.

αғsнιη (41859 rep)

Aug 23, 2024, 03:34 AM • Last activity: Aug 31, 2024, 05:11 AM

1 votes

1 answers

3349 views

system MemoryMax by percentage not working?

systemd cgroups ulimit out-of-memory

I am trying to configure my .service file to limit how much memory a given service can use up before being terminated, by percentage of system memory (10% as an upper limit in this case): [Unit] Description=MQTT Loop After=radioLoop.service [Service] Type=simple Environment=PYTHONIOENCODING=UTF-8 Ex...

I am trying to configure my .service file to limit how much memory a given service can use up before being terminated, by percentage of system memory (10% as an upper limit in this case): [Unit] Description=MQTT Loop After=radioLoop.service [Service] Type=simple Environment=PYTHONIOENCODING=UTF-8 ExecStart=/usr/bin/python3 -u /opt/pilot/mqttLoop.py WorkingDirectory=/opt/pilot StandardOutput=journal Restart=on-failure User=pilot MemoryMax=10% [Install] WantedBy=multi-user.target The line of interest is the MemoryMax line, which I've tried to configured based on my understanding of the systemd docs . My version of systemd is: systemd 241 (241) +PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD -IDN2 +IDN -PCRE2 default-hierarchy=hybrid But it does not work. # ps -m -o lwp,rss,pmem,pcpu,unit -u pilot LWP RSS %MEM %CPU UNIT - 76244 30.3 8.5 mqttLoop.service 1232 - - 7.0 mqttLoop.service 1249 - - 1.7 mqttLoop.service 1254 - - 0.2 mqttLoop.service I'm getting well above 10% (30% there), and then it does not restart the process. I've tried exchanging MemoryMax for MemoryLimit (the older variant of same value), but it has no effect. What am I missing? UPDATE ---- I have determined that the systemd settings for processing counting are correctly turned on. # grep -i "memory" system.conf #DefaultMemoryAccounting=yes But I note the following in my kernel configuration: Will it be enough that I rebuild my kernel with the Memory Controller option selected?

Travis Griggs (1681 rep)

Aug 27, 2019, 06:09 PM • Last activity: Aug 27, 2024, 06:57 AM

0 votes

1 answers

156 views

Why oom_score of a cat command in terminal is less than my window manager?

out-of-memory

For experimenting, I was seeing `oom_score` of some of processes in my computer. The `oom_score` of a cat command is 666 but `oom_score`of my window manager (which is i3) is 667. If I understood correctly, this means that if the system goes out of memory, the OS would prefer to terminate my window m...

For experimenting, I was seeing oom_score of some of processes in my computer. The oom_score of a cat command is 666 but oom_scoreof my window manager (which is i3) is 667. If I understood correctly, this means that if the system goes out of memory, the OS would prefer to terminate my window manager rather than cat command invoked in a terminal to restore some memory space. Why this is the case? Based on which calculation oom score of cat is less than my window manager? This is how I checked the OOM scores:
$ cat /proc/3446/oom_score 667 $ cat /proc/self/oom_score 666
Note that here 3446 is PID of i3 in my system.

Amir reza Riahi (883 rep)

Aug 24, 2024, 02:16 PM • Last activity: Aug 24, 2024, 02:43 PM

Previous

Page 1

Next

Showing page 1 of 20 total questions