This is a follow-up question to https://unix.stackexchange.com/questions/733250/dentry-inode-caching-on-multi-cpu-machines-memory-allocator-configuration , but here I try to put the question differently.
My problem is that I have a dual socket machine, and memory for the kernel caches (dentry/inode/buffer) are allocated from bank0 (cpu0's memory bank), and that eventually gets consumed. However, bank1 is never used for caches, so there is plenty of free memory in the overall system. So in this state the memory allocator gets memory from bank1, regardless of where my process is running (even if I set memory affinity). Due to the different memory latency when accessing memory from different banks, this means that my process (which is somewhat memory access bound with a low cache-hit ratio) will run much slower when scheduled on the cores in cpu0 than when scheduled on the cores in cpu1. (I'd like to schedule two processes, one for each cpu, and a process should use all cores of its cpu. I don't want to waste half the cores.)
What could I do to ensure that my process can get memory from the local bank, no matter on which cpu it gets scheduled on?
I tried playing with the kernel VM parameters, but they don't really do anything. After all, half the memory is free! These caches in kernel simply do not seem to take NUMA issues into account. I tried to look into cgroups, but as far as I can understand, I can't really control the kernel that way. I did not really find anything that would address my issue :-(.
I can, of course, drop all caches before starting my processes, but that is a bit heavy handed. A cleaner solution would be, for example, to limit the total cache size (say 8GB). True, cpu0 would still have a bit less memory than cpu1 (I have 64GB in both banks), but I can live with that.
I'd be grateful for any suggestions... Thanks!
Asked by LaszloLadanyi
(153 rep)
Jan 29, 2023, 05:03 PM
Last activity: Feb 19, 2023, 10:48 PM
Last activity: Feb 19, 2023, 10:48 PM