Unix & Linux Stack Exchange
Q&A for users of Linux, FreeBSD and other Unix-like operating systems
Latest Questions
0
votes
0
answers
45
views
Can "perf mem" command detect remote memory access on CXL NUMA nodes?
I wonder that if `perf mem` can detect the remote memory access on CXL NUMA Nodes. I got an AMD-EPYC-9654 server, and CXL Mem is on the Numa Node 2. I run a task on node 0, which accessed the remote Node 2 memory continuously. But unfortunately I could not test on my machine, because perf mem didn't...
I wonder that if
perf mem
can detect the remote memory access on CXL NUMA Nodes. I got an AMD-EPYC-9654 server, and CXL Mem is on the Numa Node 2. I run a task on node 0, which accessed the remote Node 2 memory continuously. But unfortunately I could not test on my machine, because perf mem didn't work on AMD CPUs (https://community.amd.com/t5/server-processors/issues-with-perf-mem-record/m-p/95270) .
Who can help me?
SeenThrough
(1 rep)
Feb 11, 2025, 02:01 AM
0
votes
0
answers
38
views
How can I enable support for the HMAT table?
I have access to a server and want to check its HMAT table. However, the HMAT table is not present (the SRAT and SLIT are though). I checked the Linux kernel config and the HMAT is enabled (`CONFIG_ACPI_HMAT=y` and `CONFIG_ACPI=y`). So probably the issue is with the hardware and firmware. Can I enab...
I have access to a server and want to check its HMAT table.
However, the HMAT table is not present (the SRAT and SLIT are though).
I checked the Linux kernel config and the HMAT is enabled (
CONFIG_ACPI_HMAT=y
and CONFIG_ACPI=y
). So probably the issue is with the hardware and firmware.
Can I enable the HMAT, and if so, how?
Here's the server spec (let me know if more information is needed):
$ uname -a
Linux node0.acpi-tinkering-0.prismgt-pg0.clemson.cloudlab.us 5.15.0-122-generic #132-Ubuntu SMP Thu Aug 29 13:45:52 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
$ sudo dmidecode -t 2
# dmidecode 3.3
Getting SMBIOS data from sysfs.
SMBIOS 3.3 present.
Handle 0x0200, DMI type 2, 8 bytes
Base Board Information
Manufacturer: Dell Inc.
Product Name: 024PW1
Version: A00
Serial Number: .13D52G3.CNIVC001610605.
$ sudo dmidecode -t 0
# dmidecode 3.3
Getting SMBIOS data from sysfs.
SMBIOS 3.3 present.
Handle 0x0000, DMI type 0, 26 bytes
BIOS Information
Vendor: Dell Inc.
Version: 2.8.4
Release Date: 06/23/2022
Address: 0xF0000
Runtime Size: 64 kB
ROM Size: 32 MB
Characteristics:
ISA is supported
PCI is supported
PNP is supported
BIOS is upgradeable
BIOS shadowing is allowed
Boot from CD is supported
Selectable boot is supported
EDD is supported
Japanese floppy for Toshiba 1.2 MB is supported (int 13h)
5.25"/360 kB floppy services are supported (int 13h)
5.25"/1.2 MB floppy services are supported (int 13h)
3.5"/720 kB floppy services are supported (int 13h)
8042 keyboard services are supported (int 9h)
Serial services are supported (int 14h)
CGA/mono video services are supported (int 10h)
ACPI is supported
USB legacy is supported
BIOS boot specification is supported
Function key-initiated network boot is supported
Targeted content distribution is supported
UEFI is supported
BIOS Revision: 2.8
sudo dmidecode -t 4
# dmidecode 3.3
Getting SMBIOS data from sysfs.
SMBIOS 3.3 present.
Handle 0x0400, DMI type 4, 48 bytes
Processor Information
Socket Designation: CPU1
Type: Central Processor
Family: Zen
Manufacturer: AMD
ID: 11 0F A0 00 FF FB 8B 17
Signature: Family 25, Model 1, Stepping 1
Flags:
FPU (Floating-point unit on-chip)
VME (Virtual mode extension)
DE (Debugging extension)
PSE (Page size extension)
TSC (Time stamp counter)
MSR (Model specific registers)
PAE (Physical address extension)
MCE (Machine check exception)
CX8 (CMPXCHG8 instruction supported)
APIC (On-chip APIC hardware supported)
SEP (Fast system call)
MTRR (Memory type range registers)
PGE (Page global enable)
MCA (Machine check architecture)
CMOV (Conditional move instruction supported)
PAT (Page attribute table)
PSE-36 (36-bit page size extension)
CLFSH (CLFLUSH instruction supported)
MMX (MMX technology supported)
FXSR (FXSAVE and FXSTOR instructions supported)
SSE (Streaming SIMD extensions)
SSE2 (Streaming SIMD extensions 2)
HTT (Multi-threading)
Version: AMD EPYC 7543 32-Core Processor
Voltage: 1.8 V
External Clock: 16000 MHz
Max Speed: 3900 MHz
Current Speed: 2800 MHz
Status: Populated, Enabled
Upgrade: Socket SP3
L1 Cache Handle: 0x0700
L2 Cache Handle: 0x0701
L3 Cache Handle: 0x0702
Serial Number: Not Specified
Asset Tag: Not Specified
Part Number: Not Specified
Core Count: 32
Core Enabled: 32
Thread Count: 64
Characteristics:
64-bit capable
Multi-Core
Hardware Thread
Execute Protection
Enhanced Virtualization
Handle 0x0401, DMI type 4, 48 bytes
Processor Information
Socket Designation: CPU2
Type: Central Processor
Family: Zen
Manufacturer: AMD
ID: 11 0F A0 00 FF FB 8B 17
Signature: Family 25, Model 1, Stepping 1
Flags:
FPU (Floating-point unit on-chip)
VME (Virtual mode extension)
DE (Debugging extension)
PSE (Page size extension)
TSC (Time stamp counter)
MSR (Model specific registers)
PAE (Physical address extension)
MCE (Machine check exception)
CX8 (CMPXCHG8 instruction supported)
APIC (On-chip APIC hardware supported)
SEP (Fast system call)
MTRR (Memory type range registers)
PGE (Page global enable)
MCA (Machine check architecture)
CMOV (Conditional move instruction supported)
PAT (Page attribute table)
PSE-36 (36-bit page size extension)
CLFSH (CLFLUSH instruction supported)
MMX (MMX technology supported)
FXSR (FXSAVE and FXSTOR instructions supported)
SSE (Streaming SIMD extensions)
SSE2 (Streaming SIMD extensions 2)
HTT (Multi-threading)
Version: AMD EPYC 7543 32-Core Processor
Voltage: 1.8 V
External Clock: 16000 MHz
Max Speed: 3900 MHz
Current Speed: 2800 MHz
Status: Populated, Enabled
Upgrade: Socket SP3
L1 Cache Handle: 0x0703
L2 Cache Handle: 0x0704
L3 Cache Handle: 0x0705
Serial Number: Not Specified
Asset Tag: Not Specified
Part Number: Not Specified
Core Count: 32
Core Enabled: 32
Thread Count: 64
Characteristics:
64-bit capable
Multi-Core
Hardware Thread
Execute Protection
Enhanced Virtualization
Matteo
(73 rep)
Dec 6, 2024, 07:34 PM
0
votes
0
answers
17
views
What does the ACPI docs about NUMA nodes mean by "dynamic migration"?
With reference to the following section of the ACPI docs about NUMA nodes: https://drops.dagstuhl.de/storage/01oasics/oasics-vol116-parma-ditam2024/OASIcs.PARMA-DITAM.2024.3/OASIcs.PARMA-DITAM.2024.3.pdf what does "dynamic migration of the devices" mean? Does it refer to physically migrating stuff (...
With reference to the following section of the ACPI docs about NUMA nodes: https://drops.dagstuhl.de/storage/01oasics/oasics-vol116-parma-ditam2024/OASIcs.PARMA-DITAM.2024.3/OASIcs.PARMA-DITAM.2024.3.pdf
what does "dynamic migration of the devices" mean?
Does it refer to physically migrating stuff (e.g. physically moving a DIMM from one slot to another), or is it referring to a logical operation (e.g. nothing changes physically)?
Matteo
(73 rep)
Dec 5, 2024, 10:15 PM
0
votes
0
answers
105
views
Understanding CPU threads is used by a multithread application
Recently, I did some research on how the CPU core is used by a multi-threaded application. I can see what cores each thread is using by using the following command: For example: ceph-osd is a multi-threaded application ``` for i in $(pgrep ceph-osd); do ps -mo pid,tid,fname,user,psr -p $i;done ``` M...
Recently, I did some research on how the CPU core is used by a multi-threaded application. I can see what cores each thread is using by using the following command:
For example: ceph-osd is a multi-threaded application
for i in $(pgrep ceph-osd); do ps -mo pid,tid,fname,user,psr -p $i;done
My CPU has 32 cores and 72 threads, so the ceph-osd will have 72 TID (thread ID):
PID TID COMMAND USER PSR
336157 - ceph-osd ceph -
- 336157 - ceph 6
- 336160 - ceph 51
- 336162 - ceph 57
- 336163 - ceph 23
- 336164 - ceph 22
- 336168 - ceph 7
- 336169 - ceph 17
- 336203 - ceph 1
...
...
But what I don't understand is:
- This ceph-osd process will use all the CPU cores, or if the thread is just allocated and maybe not always in use?
- If I use numactl to define affinity and bind the process to specific threads, will it make any different than before?
- There are more than one ceph-osd process on my server, so will it help improve performance when binding manually?
Thanks in advance.
huynp
(3 rep)
Nov 20, 2023, 12:55 PM
0
votes
1
answers
164
views
In a computer with two CPUs, how can I get the physical address of local memory of each CPU?
My computer has the following specifications - 2 Intel CPU x86_64 - Total 8GB memory (4GB per CPU) - O/S is Rocky Linux 9 I want to reserve 1GB of memory per CPU using the `memmap` parameter in grub. I checked `dmesg` and `/proc/meminfo` and even used `numaclt -H`. But I couldn't find the physical a...
My computer has the following specifications
- 2 Intel CPU x86_64
- Total 8GB memory (4GB per CPU)
- O/S is Rocky Linux 9
I want to reserve 1GB of memory per CPU using the
memmap
parameter in grub. I checked dmesg
and /proc/meminfo
and even used numaclt -H
. But I couldn't find the physical address of the memory for each CPU.
How can I get physical addresses of local memory of each CPU?
raon0ms
(11 rep)
May 16, 2023, 08:05 AM
• Last activity: Jun 6, 2023, 03:00 PM
1
votes
1
answers
745
views
NUMA aware caching on linux
This is a follow-up question to https://unix.stackexchange.com/questions/733250/dentry-inode-caching-on-multi-cpu-machines-memory-allocator-configuration, but here I try to put the question differently. My problem is that I have a dual socket machine, and memory for the kernel caches (dentry/inode/b...
This is a follow-up question to https://unix.stackexchange.com/questions/733250/dentry-inode-caching-on-multi-cpu-machines-memory-allocator-configuration , but here I try to put the question differently.
My problem is that I have a dual socket machine, and memory for the kernel caches (dentry/inode/buffer) are allocated from bank0 (cpu0's memory bank), and that eventually gets consumed. However, bank1 is never used for caches, so there is plenty of free memory in the overall system. So in this state the memory allocator gets memory from bank1, regardless of where my process is running (even if I set memory affinity). Due to the different memory latency when accessing memory from different banks, this means that my process (which is somewhat memory access bound with a low cache-hit ratio) will run much slower when scheduled on the cores in cpu0 than when scheduled on the cores in cpu1. (I'd like to schedule two processes, one for each cpu, and a process should use all cores of its cpu. I don't want to waste half the cores.)
What could I do to ensure that my process can get memory from the local bank, no matter on which cpu it gets scheduled on?
I tried playing with the kernel VM parameters, but they don't really do anything. After all, half the memory is free! These caches in kernel simply do not seem to take NUMA issues into account. I tried to look into cgroups, but as far as I can understand, I can't really control the kernel that way. I did not really find anything that would address my issue :-(.
I can, of course, drop all caches before starting my processes, but that is a bit heavy handed. A cleaner solution would be, for example, to limit the total cache size (say 8GB). True, cpu0 would still have a bit less memory than cpu1 (I have 64GB in both banks), but I can live with that.
I'd be grateful for any suggestions... Thanks!
LaszloLadanyi
(153 rep)
Jan 29, 2023, 05:03 PM
• Last activity: Feb 19, 2023, 10:48 PM
0
votes
0
answers
156
views
Are there any "gotcha's" with using NUMA on linux
I have a new system, with 2 socketed CPUs. I've heard that there can be bottlenecks when working with NUMA systems, if an application on processor 0 tries to access memory attached to processor 1. How does the linux kernel handle running software on a NUMA system? - Would the kernel (automatically)...
I have a new system, with 2 socketed CPUs. I've heard that there can be bottlenecks when working with NUMA systems, if an application on processor 0 tries to access memory attached to processor 1. How does the linux kernel handle running software on a NUMA system?
- Would the kernel (automatically) prioritize putting all of an applications threads and memory on one CPU, if possible?
- What if the application is CPU heavy and creates more threads than cores that are available on one CPU?
- What about VMs?
- If you created a KVM VM with less resources than a single CPU has access to (if both the cores and the memory is less than what is available) would it work the best way out of the box or would you need to manually set the VMs affinity or something?
- What if you wanted a VM that used more than a single CPUs resources, but you emulated NUMA? Would that work as expected out of the box too?
- If I installed bog-standard Ubuntu or centos, would I have to do anything to make it NUMA-aware
I suppose this question is very general, but I truly don't know much about how NUMA works, and I have found little documentation about it.
Kaiden Prince
(101 rep)
Jan 22, 2023, 03:05 PM
2
votes
1
answers
237
views
Processes ignore global CPUAffinity settings
I am setting global CPUAffinity via `/etc/systemd/system.conf`. See snippet below: ``` root@PC1-03:~# cat /etc/systemd/system.conf # This file is part of systemd. # # systemd is free software; you can redistribute it and/or modify it # under the terms of the GNU Lesser General Public License as publ...
I am setting global CPUAffinity via
/etc/systemd/system.conf
. See snippet below:
root@PC1-03:~# cat /etc/systemd/system.conf
# This file is part of systemd.
#
# systemd is free software; you can redistribute it and/or modify it
# under the terms of the GNU Lesser General Public License as published by
# the Free Software Foundation; either version 2.1 of the License, or
# (at your option) any later version.
#
# Entries in this file show the compile time defaults.
# You can change settings by editing this file.
# Defaults can be restored by simply deleting this file.
#
# See systemd-system.conf(5) for details.
[Manager]
#LogLevel=info
#LogTarget=journal-or-kmsg
#LogColor=yes
#LogLocation=no
#LogTime=no
#DumpCore=yes
#ShowStatus=yes
#CrashChangeVT=no
#CrashShell=no
#CrashReboot=no
#CtrlAltDelBurstAction=reboot-force
CPUAffinity=2 3 4 5 6 7 10 11 12 13 14 15 18 19 20 21 22 23 26 27 28 29 30 31 34 35 36 37 38 39 42 43 44 45 46 47 50 51 52 53 54 55 58 59 60 61 62 63 66 67 68 69 70 71 74 75 76 77 78 79 82 83 84 85 86 87 90 91 92 93 94 95 98 99 100 101 102 103 106 107 108 109 110 111 114 115 116 117 118 119 122 123 124 125 126 127 130 131 132 133 134 135 138 139 140 141 142 143 146 147 148 149 150 151 154 155 156 157 158 159 162 163 164 165 166 167 170 171 172 173 174 175 178 179 180 181 182 183 186 187 188 189 190 191 194 195 196 197 198 199 202 203 204 205 206 207 210 211 212 213 214 215 218 219 220 221 222 223 226 227 228 229 230 231 234 235 236 237 238 239 242 243 244 245 246 247 250 251 252 253 254 255
#NUMAPolicy=default
#NUMAMask=
#RuntimeWatchdogSec=0
#RebootWatchdogSec=10min
#ShutdownWatchdogSec=10min
#KExecWatchdogSec=0
#WatchdogDevice=
#CapabilityBoundingSet=
#NoNewPrivileges=no
#SystemCallArchitectures=
#TimerSlackNSec=
#StatusUnitFormat=description
#DefaultTimerAccuracySec=1min
#DefaultStandardOutput=journal
#DefaultStandardError=inherit
#DefaultTimeoutStartSec=90s
#DefaultTimeoutStopSec=90s
#DefaultTimeoutAbortSec=
#DefaultRestartSec=100ms
#DefaultStartLimitIntervalSec=10s
#DefaultStartLimitBurst=5
#DefaultEnvironment=
#DefaultCPUAccounting=no
#DefaultIOAccounting=no
#DefaultIPAccounting=no
#DefaultBlockIOAccounting=no
#DefaultMemoryAccounting=yes
#DefaultTasksAccounting=yes
#DefaultTasksMax=15%
#DefaultLimitCPU=
#DefaultLimitFSIZE=
#DefaultLimitDATA=
#DefaultLimitSTACK=
#DefaultLimitCORE=
#DefaultLimitRSS=
#DefaultLimitNOFILE=1024:524288
#DefaultLimitAS=
#DefaultLimitNPROC=
#DefaultLimitMEMLOCK=
#DefaultLimitLOCKS=
#DefaultLimitSIGPENDING=
#DefaultLimitMSGQUEUE=
#DefaultLimitNICE=
#DefaultLimitRTPRIO=
#DefaultLimitRTTIME=
However, in running a dummy load, i observe a 14 percent load on CPU 0.
0[|| 14.6%] 16[ 0.0%] 32[ 0.0%] 48[ 0.0%] 64[ 0.0%] 80[ 0.0%] 96[ 0.0%] 112[ 0.0%] 128[ 0.0%] 144[ 0.0%] 160[ 0.0%] 176[ 0.0%] 192[ 0.0%] 208[ 0.0%] 224[ 0.0%] 240[ 0.0%]
1[ 0.0%] 17[ 0.0%] 33[ 0.0%] 49[ 0.0%] 65[ 0.0%] 81[ 0.0%] 97[ 0.0%] 113[ 0.0%] 129[ 0.0%] 145[ 0.0%] 161[ 0.0%] 177[ 0.0%] 193[ 0.0%] 209[ 0.0%] 225[ 0.0%] 241[ 0.0%]
2[|||||||100.0%] 18[|||||||100.0%] 34[|||||||100.0%] 50[|||||||100.0%] 66[|||||||100.0%] 82[|||||||100.0%] 98[|||||||100.0%] 114[|||||||100.0%] 130[|||||||100.0%] 146[|||||||100.0%] 162[|||||||100.0%] 178[|||||||100.0%] 194[|||||||100.0%] 210[|||||||100.0%] 226[|||||||100.0%] 242[|||||||100.0%]
3[|||||||100.0%] 19[|||||||100.0%] 35[|||||||100.0%] 51[|||||||100.0%] 67[|||||||100.0%] 83[|||||||100.0%] 99[|||||||100.0%] 115[|||||||100.0%] 131[|||||||100.0%] 147[|||||||100.0%] 163[|||||||100.0%] 179[|||||||100.0%] 195[|||||||100.0%] 211[|||||||100.0%] 227[|||||||100.0%] 243[|||||||100.0%]
4[|||||||100.0%] 20[|||||||100.0%] 36[|||||||100.0%] 52[|||||||100.0%] 68[|||||||100.0%] 84[|||||||100.0%] 100[|||||||100.0%] 116[|||||||100.0%] 132[|||||||100.0%] 148[|||||||100.0%] 164[|||||||100.0%] 180[|||||||100.0%] 196[|||||||100.0%] 212[|||||||100.0%] 228[|||||||100.0%] 244[|||||||100.0%]
5[|||||||100.0%] 21[|||||||100.0%] 37[|||||||100.0%] 53[|||||||100.0%] 69[|||||||100.0%] 85[|||||||100.0%] 101[|||||||100.0%] 117[|||||||100.0%] 133[|||||||100.0%] 149[|||||||100.0%] 165[|||||||100.0%] 181[|||||||100.0%] 197[|||||||100.0%] 213[|||||||100.0%] 229[|||||||100.0%] 245[|||||||100.0%]
6[|||||||100.0%] 22[|||||||100.0%] 38[|||||||100.0%] 54[|||||||100.0%] 70[|||||||100.0%] 86[|||||||100.0%] 102[|||||||100.0%] 118[|||||||100.0%] 134[|||||||100.0%] 150[|||||||100.0%] 166[|||||||100.0%] 182[|||||||100.0%] 198[|||||||100.0%] 214[|||||||100.0%] 230[|||||||100.0%] 246[|||||||100.0%]
7[|||||||100.0%] 23[|||||||100.0%] 39[|||||||100.0%] 55[|||||||100.0%] 71[|||||||100.0%] 87[|||||||100.0%] 103[|||||||100.0%] 119[|||||||100.0%] 135[|||||||100.0%] 151[|||||||100.0%] 167[|||||||100.0%] 183[|||||||100.0%] 199[|||||||100.0%] 215[|||||||100.0%] 231[|||||||100.0%] 247[|||||||100.0%]
8[ 0.0%] 24[ 0.0%] 40[ 0.0%] 56[ 0.0%] 72[ 0.0%] 88[ 0.0%] 104[ 0.0%] 120[ 0.0%] 136[ 0.0%] 152[ 0.0%] 168[ 0.0%] 184[ 0.0%] 200[ 0.0%] 216[ 0.0%] 232[ 0.0%] 248[ 0.0%]
9[ 0.0%] 25[ 0.0%] 41[ 0.0%] 57[ 0.0%] 73[ 0.0%] 89[ 0.0%] 105[ 0.0%] 121[ 0.0%] 137[ 0.0%] 153[ 0.0%] 169[ 0.0%] 185[ 0.0%] 201[ 0.0%] 217[ 0.0%] 233[ 0.0%] 249[ 0.0%]
10[|||||||100.0%] 26[|||||||100.0%] 42[|||||||100.0%] 58[|||||||100.0%] 74[|||||||100.0%] 90[|||||||100.0%] 106[|||||||100.0%] 122[|||||||100.0%] 138[|||||||100.0%] 154[|||||||100.0%] 170[|||||||100.0%] 186[|||||||100.0%] 202[|||||||100.0%] 218[|||||||100.0%] 234[|||||||100.0%] 250[|||||||100.0%]
11[|||||||100.0%] 27[|||||||100.0%] 43[|||||||100.0%] 59[|||||||100.0%] 75[|||||||100.0%] 91[|||||||100.0%] 107[|||||||100.0%] 123[|||||||100.0%] 139[|||||||100.0%] 155[|||||||100.0%] 171[|||||||100.0%] 187[|||||||100.0%] 203[|||||||100.0%] 219[|||||||100.0%] 235[|||||||100.0%] 251[|||||||100.0%]
12[|||||||100.0%] 28[|||||||100.0%] 44[|||||||100.0%] 60[|||||||100.0%] 76[|||||||100.0%] 92[|||||||100.0%] 108[|||||||100.0%] 124[|||||||100.0%] 140[|||||||100.0%] 156[|||||||100.0%] 172[|||||||100.0%] 188[|||||||100.0%] 204[|||||||100.0%] 220[|||||||100.0%] 236[|||||||100.0%] 252[|||||||100.0%]
13[|||||||100.0%] 29[|||||||100.0%] 45[|||||||100.0%] 61[|||||||100.0%] 77[|||||||100.0%] 93[|||||||100.0%] 109[|||||||100.0%] 125[|||||||100.0%] 141[|||||||100.0%] 157[|||||||100.0%] 173[|||||||100.0%] 189[|||||||100.0%] 205[|||||||100.0%] 221[|||||||100.0%] 237[|||||||100.0%] 253[|||||||100.0%]
14[|||||||100.0%] 30[|||||||100.0%] 46[|||||||100.0%] 62[|||||||100.0%] 78[|||||||100.0%] 94[|||||||100.0%] 110[|||||||100.0%] 126[|||||||100.0%] 142[|||||||100.0%] 158[|||||||100.0%] 174[|||||||100.0%] 190[|||||||100.0%] 206[|||||||100.0%] 222[|||||||100.0%] 238[|||||||100.0%] 254[|||||||100.0%]
15[|||||||100.0%] 31[|||||||100.0%] 47[|||||||100.0%] 63[|||||||100.0%] 79[|||||||100.0%] 95[|||||||100.0%] 111[|||||||100.0%] 127[|||||||100.0%] 143[|||||||100.0%] 159[|||||||100.0%] 175[|||||||100.0%] 191[|||||||100.0%] 207[|||||||100.0%] 223[|||||||100.0%] 239[|||||||100.0%] 255[|||||||100.0%]
I check the processes running on the core, and there still are some that exist.
Small snippet below.
root@PC1-03:~# ps -A -o psr,pid,args | grep '^ *0' | head -n 25
0 3 [rcu_gp]
0 4 [rcu_par_gp]
0 5 [netns]
0 7 [kworker/0:0H-events_highpri]
0 9 [kworker/0:1H-events_highpri]
0 11 [mm_percpu_wq]
0 12 [rcu_tasks_kthread]
0 13 [rcu_tasks_rude_kthread]
0 14 [rcu_tasks_trace_kthread]
0 15 [ksoftirqd/0]
0 17 [migration/0]
0 18 [kworker/0:1-events]
0 19 [cpuhp/0]
0 94 [kworker/15:0H]
0 110 [kworker/18:0H]
0 120 [kworker/20:0H]
0 125 [kworker/21:0H]
0 130 [kworker/22:0H]
0 140 [kworker/24:0H]
0 145 [kworker/25:0H]
0 165 [kworker/29:0H]
0 170 [kworker/30:0H]
0 186 [kworker/33:0H]
0 216 [kworker/39:0H]
0 221 [kworker/40:0H]
Is there additional config I need to set to make sure stuff does not run on the cores I don't want it to?
Trevor K Smith
(71 rep)
Oct 4, 2022, 06:18 AM
• Last activity: Oct 15, 2022, 09:30 AM
2
votes
1
answers
272
views
Why does a 2-socket server show PCIe locations but the 4-socket server does not (how can I find the PCIe locations on the 4-socket server)?
I have two servers: - 2 socket [Supermicro X9DBL-3F][1] - 4 socket [Supermicro X10QBI][2] When I run `hwloc-ls` for the 2-socket server I see the PCIe topology with the HostBridges on each NUMANode, but the 4-socket server shows Packages instead of NUMANodes and all of the HostBridges are listed at...
I have two servers:
- 2 socket Supermicro X9DBL-3F
- 4 socket Supermicro X10QBI
When I run
hwloc-ls
for the 2-socket server I see the PCIe topology with the HostBridges on each NUMANode, but the 4-socket server shows Packages instead of NUMANodes and all of the HostBridges are listed at the bottom. In addition, lscpu
shows 2 NUMA nodes on the 2-socket but only 1 NUMA node on the 4-socket server.
How can I discern which PCIe device is attached to which socket on the 4-socket server?
When I run hwloc-ls
on the 2-socket server I get the following:
Machine (63GB total)
NUMANode L#0 (P#0 31GB)
Package L#0 + L3 L#0 (20MB)
L2 L#0 (256KB) + L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0
PU L#0 (P#0)
PU L#1 (P#16)
...
HostBridge L#0
PCIBridge
PCI 17d3:1880
Block(Disk) L#0 "sda"
NUMANode L#1 (P#1 31GB)
Package L#1 + L3 L#1 (20MB)
L2 L#8 (256KB) + L1d L#8 (32KB) + L1i L#8 (32KB) + Core L#8
PU L#16 (P#8)
PU L#17 (P#24)
...
HostBridge L#6
PCIBridge
PCI 8086:10fb
Net L#8 "eth0"
... and when I run it on the 4-socket server I get the following:
Machine (126GB)
Package L#0 + L3 L#0 (38MB)
L2 L#0 (256KB) + L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0
PU L#0 (P#0)
PU L#1 (P#60)
...
Package L#1 + L3 L#1 (38MB)
L2 L#2 (256KB) + L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2
PU L#4 (P#15)
PU L#5 (P#75)
...
Package L#2 + L3 L#2 (38MB)
L2 L#4 (256KB) + L1d L#4 (32KB) + L1i L#4 (32KB) + Core L#4
PU L#7 (P#30)
PU L#8 (P#90)
...
Package L#3 + L3 L#3 (38MB)
L2 L#6 (256KB) + L1d L#6 (32KB) + L1i L#6 (32KB) + Core L#6
PU L#10 (P#45)
PU L#11 (P#105)
...
Misc(MemoryModule)
...
HostBridge L#5
PCIBridge
PCI 8086:10c9
Net L#6 "ens8f0"
2-socket lscpu
:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 32
On-line CPU(s) list: 0-31
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 62
Model name: Intel(R) Xeon(R) CPU E5-2450 v2 @ 2.50GHz
Stepping: 4
CPU MHz: 2804.841
CPU max MHz: 3300.0000
CPU min MHz: 1200.0000
BogoMIPS: 5000.25
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 20480K
NUMA node0 CPU(s): 0-7,16-23
NUMA node1 CPU(s): 8-15,24-31
4-socket lscpu
:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 120
On-line CPU(s) list: 0-119
Thread(s) per core: 2
Core(s) per socket: 15
Socket(s): 4
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 62
Model name: Intel(R) Xeon(R) CPU E7-4890 v2 @ 2.80GHz
Stepping: 7
CPU MHz: 1199.953
CPU max MHz: 3400.0000
CPU min MHz: 1200.0000
BogoMIPS: 5600.25
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 38400K
NUMA node0 CPU(s): 0-119
KJ7LNW
(525 rep)
Oct 2, 2020, 12:44 AM
• Last activity: Jul 7, 2022, 10:27 PM
8
votes
3
answers
5216
views
How to disable memory for a NUMA node on a Linux system
Is there a way to disable access to memory associated with a given NUMA node/socket on a [NUMA][1] machine? We have a bit of controversy with the database vendor about our HP DL560 machines. The DB sales type’s technical support person was animated that we could not use our DL560s but had to buy new...
Is there a way to disable access to memory associated with a given NUMA node/socket on a NUMA machine?
We have a bit of controversy with the database vendor about our HP DL560 machines. The DB sales type’s technical support person was animated that we could not use our DL560s but had to buy new DL360s since they have fewer sockets. I believe their concern is the speed of accessing inter-socket memory. They recommended that if I insisted on keeping the DL560s, I should leave two of the sockets empty. I think they are mistaken (AKA crazy) but I need tests to demonstrate that I am on solid ground.
My configuration:
The machines have four sockets, each of which has 22 hyperthreaded physical cores, for a total of 176 apparent cores with a total of 1.5 T of memory.
The operating system is Red Hat Enterprise Linux Server release 7.4.
The lscpu display reads (in part):
$ lscpu | egrep 'NUMA|ore'
Thread(s) per core: 2
Core(s) per socket: 22
NUMA node(s): 4
NUMA node0 CPU(s): 0-21,88-109
NUMA node1 CPU(s): 22-43,110-131
NUMA node2 CPU(s): 44-65,132-153
NUMA node3 CPU(s): 66-87,154-175
If I had access to the physical hardware, I would consider pulling the processors from two of the sockets to prove my point but I don’t have access and I don’t have permission to go monkeying around with the hardware anyway.
The next best thing would be to virtually disable the sockets using the operating system. I read on this link that I can take a processor out of service with
echo 0 > /sys/devices/system/cpu/cpu3/online
and, indeed, the processors the processors are out of service but that says nothing about the memory.
I just turned off all the processors for socket #3 with (using lscpu to find which are for Socket#3):
for num in {66..87} {154..175}
do
echo 0 > /sys/devices/system/cpu/cpu${num}/online
cat /sys/devices/system/cpu/cpu${num}/online
done
and got:
$ grep N3 /proc/$$/numa_maps
7fe5daa79000 default file=/usr/lib64/libm-2.17.so mapped=16 mapmax=19 N3=16 kernelpagesize_kB=4
Which, if I am reading this correctly, shows my current process is using memory in socket #3. Except the shell was already running when I turned off the processors.
I started a new process that does its best to gobble up memory and
$ cat /proc/18824/numa_maps | grep N3
Returns no records initially but After gobbling up memory for a long time, it starts using memory on Node 3.
I tried running my program with
numactl
and binding to nodes 0,1,2 and it works as expected ... except I don’t have control over the vendor's software and there is no provision in Linux to set another process as is done with the set_mempolicy
service as used by numactl
.
Short of physically removing the processors, Is there a way to force the issue?
user1683793
(423 rep)
Jun 18, 2019, 04:29 PM
• Last activity: Apr 30, 2022, 05:45 AM
3
votes
1
answers
1798
views
Does Linux's NUMA architecture shares main memory as well?
I am reading about NUMA (Non-uniform memory access) architecture. It looks like this is the hardware architecture that on the multiprocessor system, each core accesses their internal local memory is faster than the remote memory. The thing I don't know is: looks like the main memory (RAM) is also di...
I am reading about NUMA (Non-uniform memory access) architecture. It looks like this is the hardware architecture that on the multiprocessor system, each core accesses their internal local memory is faster than the remote memory.
The thing I don't know is: looks like the main memory (RAM) is also divided between nodes. That makes me confused because I think all the nodes (which are stayed inside the same CPU) will have the same access speed to the main memory. So why does Linux divide the main memory for each node?
hqt
(607 rep)
Apr 30, 2020, 10:14 PM
• Last activity: Jun 9, 2020, 11:50 PM
0
votes
1
answers
1721
views
numactl: this system does not support NUMA policy
When using numactl, I was seeing numactl: this system does not support NUMA policy. Is it because some kernel config is not enabled? Confirmed BIOS enabled NUMA. lscpu shows there are NUMA nodes.
When using numactl, I was seeing
numactl: this system does not support NUMA policy.
Is it because some kernel config is not enabled?
Confirmed BIOS enabled NUMA.
lscpu shows there are NUMA nodes.
Mark K
(955 rep)
Mar 28, 2020, 02:41 AM
• Last activity: Mar 29, 2020, 09:29 PM
0
votes
1
answers
468
views
Is a page fault across NUMA nodes "major" or "minor"?
I understand that on a single-socket Linux system, a command such as `sudo ps -eo min_flt,maj_flt,cmd` will generally count a page fault as "minor" if it blocks on a memory-to-memory copy, or on the zeroing of a deallocated page, or for some reason doesn't touch persistent storage. But is this true...
I understand that on a single-socket Linux system, a command such as
sudo ps -eo min_flt,maj_flt,cmd
will generally count a page fault as "minor" if it blocks on a memory-to-memory copy, or on the zeroing of a deallocated page, or for some reason doesn't touch persistent storage. But is this true on NUMA systems as well, even when the fault requires a data transfer from one NUMA node to another? Or does that cross the line into "major"?
Pr0methean
(101 rep)
Nov 22, 2019, 05:56 AM
• Last activity: Nov 22, 2019, 08:39 AM
25
votes
5
answers
29144
views
Enabling NUMA for Intel Core i7
In Linux kernel, the documentation for `CONFIG_NUMA` says: Enable NUMA (Non Uniform Memory Access) support. he kernel will try to allocate memory used by a CPU on the local memory controller of the CPU and add some more NUMA awareness to the kernel. For 64-bit this is recommended if the system is In...
In Linux kernel, the documentation for
CONFIG_NUMA
says:
Enable NUMA (Non Uniform Memory Access) support.
he kernel will try to allocate memory used by a CPU on the
local memory controller of the CPU and add some more
NUMA awareness to the kernel.
For 64-bit this is recommended if the system is Intel Core i7
(or later), AMD Opteron, or EM64T NUMA.
I have an Intel Core i7 processor, but AFAICT it only has one NUMA node:
$ numactl --hardware
available: 1 nodes (0)
node 0 cpus: 0 1 2 3 4 5 6 7
node 0 size: 16063 MB
node 0 free: 15031 MB
node distances:
node 0
0: 10
So what is the purpose of having CONFIG_NUMA=y
, when i7 has only one NUMA node ?
user1968963
(4163 rep)
Sep 25, 2013, 12:41 PM
• Last activity: Aug 11, 2019, 09:52 PM
1
votes
0
answers
32
views
Why do we use dynamic memory in ccNUMA systems when we talk about data distribution into locality domains by first touch policy?
In many books, when they talk about first touch policy in ccNUMA systems they are using dynamic memory allocation when they distribute data across locality domains. What if, for example, we have an array in stack?Does first touch policy work in the same way?
In many books, when they talk about first touch policy in ccNUMA systems they are using dynamic memory allocation when they distribute data across locality domains. What if, for example, we have an array in stack?Does first touch policy work in the same way?
Arvanitis
(11 rep)
Jun 30, 2019, 11:26 AM
• Last activity: Jun 30, 2019, 01:02 PM
2
votes
0
answers
679
views
Allocate pools of hugepages separately on each NUMA doamin
On my dual-socket machine, I'm trying to allocated two pools of hugepages (one for each socket), so that the application A, which is pinned on the first socket, uses the first pool, and the application B on the second socket uses it's own local pool. However, when I put the number of huge pages in t...
On my dual-socket machine, I'm trying to allocated two pools of hugepages (one for each socket), so that the application A, which is pinned on the first socket, uses the first pool, and the application B on the second socket uses it's own local pool.
However, when I put the number of huge pages in the
/sys/devices/system/node/node{0,1}/hugepages/hugepages-1048576kB/nr_hugepages
, the hugeadm --explain
still shows me a single pool and one mount point for it, instead of two.
The goal is to have two processes one on each socket, so that each process works only on the local pool of hugepages.
Vahid Noormofidi
(162 rep)
Mar 11, 2019, 10:28 PM
• Last activity: May 1, 2019, 09:59 PM
2
votes
0
answers
197
views
What prevents page migration?
OpenSUSE 42.3, Kernel 4.4.175-89-default Running memory bandwidth intensive applications I realised the following behaviour: The application uses ~55% of the physical memory of a NUMA system with 2 nodes. The application is parallelized using OpenMP, but without accounting for NUMA. So it relies on...
OpenSUSE 42.3, Kernel 4.4.175-89-default
Running memory bandwidth intensive applications I realised the following behaviour:
The application uses ~55% of the physical memory of a NUMA system with 2 nodes. The application is parallelized using OpenMP, but without accounting for NUMA. So it relies on page migration to achieve a somewhat decent execution speed.
Here is how that looks like:
At around 180 iterations, I cleared caches manually using
# echo 3 >| /proc/sys/vm/drop_caches
The result is an immediate performance improvement.
What prevents the system from doing proper page migration before I cleared the caches manually?

MechEng
(233 rep)
Apr 12, 2019, 08:52 AM
• Last activity: Apr 12, 2019, 10:21 AM
2
votes
1
answers
92
views
Where does the default numa setting come from?
when we run: numactl --hardware we can see the current status of numa setting. However, it seems not set by Linux ( at least, I didn't add a parameter to set it ). Did it set by BIOS?
when we run:
numactl --hardware
we can see the current status of numa setting.
However, it seems not set by Linux ( at least, I didn't add a parameter to set it ).
Did it set by BIOS?
Mark
(747 rep)
Jan 23, 2019, 12:28 PM
• Last activity: Jan 28, 2019, 12:36 PM
2
votes
1
answers
6448
views
Sub-process returned an error code when apt-get install package
sudo apt-get install numactl E: Problem executing scripts APT::Update::Post-Invoke-Success '/usr/bin/test -e /usr/share/dbus-1/system-services/org.freedesktop.PackageKit.service && /usr/bin/test -S /var/run/dbus/system_bus_socket && /usr/bin/gdbus call --system --dest org.freedesktop.PackageKit --ob...
sudo apt-get install numactl
E: Problem executing scripts APT::Update::Post-Invoke-Success '/usr/bin/test -e /usr/share/dbus-1/system-services/org.freedesktop.PackageKit.service && /usr/bin/test -S /var/run/dbus/system_bus_socket && /usr/bin/gdbus call --system --dest org.freedesktop.PackageKit --object-path /org/freedesktop/PackageKit --timeout 4 --method org.freedesktop.PackageKit.StateHasChanged cache-update > /dev/null; /bin/echo > /dev/null'
E: Sub-process returned an error code
How to fix it?

showkey
(499 rep)
May 2, 2018, 12:50 AM
• Last activity: Jan 25, 2019, 05:01 PM
2
votes
1
answers
329
views
Find out where the allocated memory for a process resides
I would like to investigate where the memory for a specific process is allocated. To be more specific: I am running an OpenMP parallel Fortran binary on a ccNUMA machine with two physical CPUs. My concern is that this program violates the first touch rule when initializing its variables. This would...
I would like to investigate where the memory for a specific process is allocated.
To be more specific: I am running an OpenMP parallel Fortran binary on a ccNUMA machine with two physical CPUs. My concern is that this program violates the first touch rule when initializing its variables. This would lead to memory being allocated in a non-balanced fashion, i.e. most of the memory would be allocated in the address space of only one physical CPU instead of balancing it between both CPUs. In turn, this would lead to poor scaling for this memory-bandwidth limited application.
Unfortunately, I don't have access to the source code. So looking at the memory allocation seems like a good way way to find out. Other ideas are welcome.
Edit due to comments: OpenSUSE Leap 42.3, kernel version 4.4.103-36-default
MechEng
(233 rep)
Apr 30, 2018, 09:35 AM
• Last activity: May 2, 2018, 09:44 AM
Showing page 1 of 20 total questions