Unix & Linux Stack Exchange
Q&A for users of Linux, FreeBSD and other Unix-like operating systems
Latest Questions
1
votes
0
answers
41
views
Too slow Tiered Memory Demotion and CPU Lock-up(maybe) with cgroup v2 memory.high
We are currently testing tiered memory demotion on a machine equipped with a CXL device. To facilitate this, we created a specific script (https://github.com/hyun-sa/comem) and are using the memory.high setting within a cgroup to force memory demotion. These are the commands we used to enable demoti...
We are currently testing tiered memory demotion on a machine equipped with a CXL device.
To facilitate this, we created a specific script (https://github.com/hyun-sa/comem) and are using the memory.high setting within a cgroup to force memory demotion.
These are the commands we used to enable demotion:
echo 1 > /sys/kernel/mm/numa/demotion_enabled
echo 2 > /proc/sys/kernel/numa_balancing
The issue we're facing is that while demotion does occur, it proceeds extremely slowly—even slower than swapping to disk. Furthermore, during a 7-Zip benchmark, we observe a severe drop in CPU utilization, as if some process is causing a lock.
This is our running example (7zr b -md25 while memory is limited as 8G by memory.high)
7-Zip (r) 23.01 (x64) : Igor Pavlov : Public domain : 2023-06-20
64-bit locale=C.UTF-8 Threads:128 OPEN_MAX:1024
d25
Compiler: 13.2.0 GCC 13.2.0: SSE2
Linux : 6.15.6 : #1 SMP PREEMPT_DYNAMIC Tue Jul 15 06:39:48 UTC 2025 : x86_64
PageSize:4KB THP:madvise hwcap:2 hwcap2:2
AMD EPYC 9554 64-Core Processor (A10F11)
1T CPU Freq (MHz): 3710 3731 3732 3733 3733 3732 3732
64T CPU Freq (MHz): 6329% 3674 6006% 3495
RAM size: 386638 MB, # CPU hardware threads: 128
RAM usage: 28478 MB, # Benchmark threads: 128
Compressing | Decompressing
Dict Speed Usage R/U Rating | Speed Usage R/U Rating
KiB/s % MIPS MIPS | KiB/s % MIPS MIPS
22: 477942 10925 4256 464943 | 5843081 12451 4001 498193
23: 337115 8816 3896 343480 | 5826376 12606 3999 504053
24: 1785 108 1772 1919 | 5654618 12631 3928 496161
25: 960 63 1739 1097 | 1767869 4606 3415 157287
---------------------------------- | ------------------------------
Avr: 204451 4978 2916 202860 | 4772986 10573 3836 413924
Tot: 7776 3376 308392
execution_time(ms): 2807639
Is there a potential misunderstanding of how cgroups function or a misconfiguration in my setup that could be causing this behavior?
Our machine specifications are as follows:
Mainboard : Supermicro H13SSL-NT
CPU : Epyc 9554 (nps 1)
Dram : 128G
CXL device : SMART Modular Technologies Device c241
OS : Ubuntu 24.04 LTS
Kernel : Linux 6.15.6
numactl -H
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127
node 0 size: 128640 MB
node 0 free: 117909 MB
node 1 cpus:
node 1 size: 257998 MB
node 1 free: 257840 MB
node distances:
node 0 1
0: 10 50
1: 255 10
Thank you for your help.
Hyunsa
(11 rep)
Jul 22, 2025, 05:43 AM
• Last activity: Jul 22, 2025, 06:04 AM
2
votes
1
answers
44
views
Are the page tables of the process preempted swapped out if there is a dearth of memory for new process
Suppose process A has been preempted to allow process B to run. If system memory is low and the kernel needs to reclaim memory for process B, is it possible for the page tables of process A to be swapped out to disk? I understand that when a page belonging to a process is swapped out, the correspond...
Suppose process A has been preempted to allow process B to run. If system memory is low and the kernel needs to reclaim memory for process B, is it possible for the page tables of process A to be swapped out to disk?
I understand that when a page belonging to a process is swapped out, the corresponding page table entry (PTE) must be updated to indicate that the page has been swapped. But if the page tables of process A were already swapped out before its pages are selected for swapping, how does the kernel update the PTEs to reflect this?
In such a scenario, will the kernel swap the page tables of process A back into memory just to update the PTEs? Or is there some alternative mechanism used?
Just tried reading other sources from the internet, didn't find much
Padala Teja Sai Kumar Reddy
(23 rep)
Jul 13, 2025, 01:12 PM
• Last activity: Jul 13, 2025, 01:57 PM
10
votes
3
answers
1023
views
Why can ZONE_NORMAL only go up to 896MiB on 32-bit x86 processors?
According to Linux Kernel Development by Robert Love, p. 233: > Because of hardware limitations, the kernel cannot treat all pages as identical. Some pages, because of their physical address in memory, cannot be used for certain tasks. Because of this limitation, the kernel divides pages into differ...
According to Linux Kernel Development by Robert Love, p. 233:
> Because of hardware limitations, the kernel cannot treat all pages as identical. Some pages, because of their physical address in memory, cannot be used for certain tasks. Because of this limitation, the kernel divides pages into different *zones*. The kernel uses the zones to group pages of similar properties. In particular, Linux has to deal with two shortcomings of hardware with respect to memory addressing:
> * Some hardware devices can perform DMA (direct memory access) to only certain memory addresses.
> * **Some architectures can physically addressing [sic] larger amounts of memory than they can virtually address.** Consequently, some memory is not permanently mapped into the kernel address space.
The bold face is added by me. What does this statement mean? Love goes on to say:
> What an architecture can and cannot directly map varies. On 32-bit x86 systems,
ZONE_HIGHMEM
is all memory above the physical 896MB mark. On other architectures, ZONE_HIGHMEM
is empty because all memory is directly mapped.
Could this mean that the MMU on 32-bit x86 can't map more than 896MiB or so of physical addresses at one time? In other words, the MMU can handle a page table with up to 896MiB/4KiB = 229,376 entries?
[This answer](https://unix.stackexchange.com/a/98229/703626) says that Love is talking about Physical Address Extension (PAE). If my mental model of the whole situation is correct, it is saying that if the kernel were to ever need to take up more than 4GiB of physical memory on a system with PAE, then it would not be able to fit enough entries in the page table at once (because there's not enough virtual address space to address it at once), so it would need to bring pages in and out (hence "Dynamically mapped pages" in Love's table on p. 234). But then why is the mark so low at 896MiB? Why can't it be somewhere higher, like 4096-128MiB? Then, on any system where the physical address space *does* fit within the virtual address space (<4GiB), almost all of the memory would be ZONE_NORMAL
, and could be mapped in the kernel's virtual address space without needing to dynamically map and unmap memory. This is the current situation on x86_64.
Andymang
(103 rep)
Jul 3, 2025, 03:47 PM
• Last activity: Jul 4, 2025, 06:44 PM
4
votes
1
answers
1130
views
Process memory layout - difference between heap, data and mmap areas
I see in the web many conflicting or unclear descriptions of the memory layout of a Linux process. Usually the [common diagram]( https://stackoverflow.com/q/64038876/8529284) looks like: [![enter image description here][1]][1] And a common [description](https://www.quora.com/Is-the-data-segment-is-p...
I see in the web many conflicting or unclear descriptions of the memory layout of a Linux process. Usually the [common diagram](
https://stackoverflow.com/q/64038876/8529284) looks like:
And a common [description](https://www.quora.com/Is-the-data-segment-is-part-of-the-heap-or-the-heap-is-part-of-it/answer/Sudarshan-43?ch=15&oid=30002660&share=af08bbcb&srid=2KkSm&target_type=answer) would say that:
> The data segment contains only global or static variable which have a
> predefined value and can be modified. Heap contains the dynamically
> allocated data that is stored in a memory section we refer that as
> heap section and this section typically starts where data segments
> ends.
And [also](https://stackoverflow.com/a/14954147/8529284) :
> The heap is, generally speaking, one specific memory region created by
> the C runtime, and managed by

malloc
(which in turn uses the brk
> and sbrk
system calls to grow and shrink).
>
> mmap
is a way of creating new memory regions, independently of
> malloc
(and so independently of the heap). munmap
is simply its
> inverse, it releases these regions.
Many of the those explanations seem outdated, and I find many discrepancies. For instance, many articles - as the answer above - claim that the heap is used my malloc
, but this is actualy a library call that's using either sbrk
or mmap
, as the malloc
[man page](https://man7.org/linux/man-pages/man3/malloc.3.html) says:
> Normally, malloc()
allocates memory from the heap, and adjusts
> the size of the heap as required, using sbrk(2)
. When allocating
> blocks of memory larger than **MMAP_THRESHOLD** bytes, the glibc
> malloc()
implementation allocates the memory as a private
> anonymous mapping using mmap(2)
.
So if malloc
in many cases in implemented by mmap
, what's the difference between the heap and and the mmap area?
Another thing that seems like a contradiction is that many articles (as the malloc
man page itself) claim that brk
/sbrk
adjust the size of the heap, but their [man page](https://man7.org/linux/man-pages/man2/brk.2.html) says it actually adjust the size of the **data segment**:
> brk()
and sbrk()
change the location of the **program break**,
> which
> defines the end of the process's data segment (i.e., the program
> break is the first location after the end of the uninitialized
> data segment).
So I'm trying to get a clear, up-to-date overall explanation of the memory layout of processes nowadays with the different segments, that also addresses those questions:
1. What is the difference between the heap and the mmap areas? (From some tests I was attempting, by looking at the addresses I got from mmap
and comparing to the range of the heap in /proc/self/maps
, it seems that some mmap
allocated pages are actually allocated inside the heap segment.)
2. Does the **break** signifies the end of the **data segment**, or the end of the **heap**?
Other related questions:
* [how brk pointer grow after calling malloc](https://unix.stackexchange.com/q/610939/273579)
* [When is the heap used for dynamic memory allocation?](https://unix.stackexchange.com/q/411408/273579)
aviro
(6925 rep)
Feb 13, 2024, 01:07 PM
• Last activity: Jun 9, 2025, 06:03 AM
0
votes
0
answers
27
views
Intel VM-exit EPT_VIOLATION error_code
I'm using `trace-cmd` on a Linux host running a qemu/kvm VM in order to check `VM-exit` events related to `EPT_VIOLATION` reason. root@eve-ng62:~# trace-cmd record -b 20000 -e kvm:kvm_page_fault -e kvm_exit -P 628130 Hit Ctrl^C to stop recording root@eve-ng62:~# trace-cmd report | grep kvm_page_faul...
I'm using
trace-cmd
on a Linux host running a qemu/kvm VM in order to check VM-exit
events related to EPT_VIOLATION
reason.
root@eve-ng62:~# trace-cmd record -b 20000 -e kvm:kvm_page_fault -e kvm_exit -P 628130
Hit Ctrl^C to stop recording
root@eve-ng62:~# trace-cmd report | grep kvm_page_fault -B 1
CPU 3/KVM-628130 1707156.150815: kvm_exit: reason EPT_VIOLATION rip 0x7f152741da17 info 181 0
CPU 3/KVM-628130 1707156.150816: kvm_page_fault: vcpu 3 rip 0x7f152741da17 address 0x00000003c93f8e00 error_code 0x181
root@eve-ng62:~#
The output show the faulting address 0x00000003c93f8e00
and the error_code 0x181
.
I'm not sure whether that error_code
is the one from the function https://github.com/torvalds/linux/blob/master/arch/x86/kvm/vmx/vmx.c#L5810 .
I also found this related thread here .
CarloC
(385 rep)
May 13, 2025, 07:38 AM
• Last activity: May 15, 2025, 02:59 PM
0
votes
0
answers
110
views
Allocating contiguous physical memory using huge pages in kernel module
0 I need a kernel module that allocates 8MB of physically contiguous memory using 2MB huge pages, in response to a user-space mmap() request. While I’ve successfully used alloc_pages() with 4KB pages to allocate smaller contiguous chunks like 256KB or 512KB, I’m unsure if this approach can be used t...
0
I need a kernel module that allocates 8MB of physically contiguous memory using 2MB huge pages, in response to a user-space mmap() request. While I’ve successfully used alloc_pages() with 4KB pages to allocate smaller contiguous chunks like 256KB or 512KB, I’m unsure if this approach can be used to allocate 8MB of physically contiguous memory backed by 2MB huge pages.
To consistently allocate 8MB using 2MB huge pages, is there a way to reserve these huge pages And in scenarios where sufficient 2MB pages aren't available, is there a fallback mechanism to allocate the same 8MB region using 4KB pages instead?
ReturnAddress
(3 rep)
Apr 18, 2025, 04:23 PM
1
votes
1
answers
1824
views
Size of virtual memory in Linux
On what basis the size of User and kernel virtual memory is decided in Linux? (32-bit, if that's relevant.) Is it configurable? If we have 512 MB RAM What will be the size of user and kernel virtual address?
On what basis the size of User and kernel virtual memory is decided in Linux? (32-bit, if that's relevant.) Is it configurable?
If we have 512 MB RAM What will be the size of user and kernel virtual address?
Krishnamoorthi M
(125 rep)
Feb 13, 2020, 05:53 AM
• Last activity: Mar 25, 2025, 11:10 AM
0
votes
2
answers
309
views
Virtual Address Space
I have started to learn about Virtual Address Space (VAS) and I have few questions: 1. How much of VAS is created for each process depending on the architecture (32-bit and 64-bit)? 2. Is VAS for each process created on hard disk? If so, what happens if there is not enough space? 3. What are the con...
I have started to learn about Virtual Address Space (VAS) and I have few questions:
1. How much of VAS is created for each process depending on the architecture (32-bit and 64-bit)?
2. Is VAS for each process created on hard disk? If so, what happens if there is not enough space?
3. What are the contents stored in VAS like text, data, BSS ?
Vivek
(101 rep)
Dec 22, 2020, 11:44 AM
• Last activity: Mar 25, 2025, 08:43 AM
0
votes
0
answers
178
views
Resident memory reported significantly lesser than Proportional Resident memory - Process Exporter and Grafana
I have a process monitoring stack setup with process exporter and Grafana with the "process profiling with treemap" dashboard, and I see some suspicious behaviour regarding the memory it is reporting. Based on my understanding (Having read [this](https://unix.stackexchange.com/questions/33381/gettin...
I have a process monitoring stack setup with process exporter and Grafana with the "process profiling with treemap" dashboard, and I see some suspicious behaviour regarding the memory it is reporting.
Based on my understanding (Having read [this](https://unix.stackexchange.com/questions/33381/getting-information-about-a-process-memory-usage-from-proc-pid-smaps) article and the links mentioned within that article):
RSS = Private memory + shared memory
PSS = Private memory + (shared memory / num of processes sharing said memory)
This leads me to believe that RSS >= PSS at any given time.
Here is what I observe:
1. Process takes 39.2GB of virtual memory
[process alloted 39.2GB virtual memory](https://i.sstatic.net/zOFaMpz5.png)
2. Process takes up 38.1GB of "proportional resident memory", this makes sense
[Process takes up 38.1GB proportional resident memory](https://i.sstatic.net/BC3QAWzu.png)
3. This is where this get suspicious, process takes up only 18.8GB of resident memory.
[Process takes only 19.2GB resident memory](https://i.sstatic.net/ZlsbQymS.png)
Is my understanding of how RSS and PSS works correct? If yes, what could be the reasons this process is behaving like this(or being reported as such). I suspected process exporter or grafana might be incorrect but no other process reports something suspicious like this so im assuming they're working as expected.
I looked at the process exporter github to confirm if my understanding of the fields reported by it is correct.
resident: Field rss(24) from /proc/[pid]/stat, whose doc says:
This is just the pages which count toward text, data, or stack space. This does not include pages which have not been demand-loaded in, or which are swapped out.
proportionalResident: Sum of "Pss" fields from /proc/[pid]/smaps, whose doc says:
The "proportional set size" (PSS) of a process is the count of pages it has in memory, where each page is divided by the number of processes sharing it.
No pages have been swapped out.
Here are the queries used by Grafana to graph these:
proportional resident memory:
namedprocess_namegroup_memory_bytes{instance=~"$instance", memtype="proportionalResident"} > 0
virtual memory:
namedprocess_namegroup_memory_bytes{instance=~"$instance", memtype="virtual"}
resident memory:
namedprocess_namegroup_memory_bytes{instance=~"$instance", memtype="resident"} / ignoring(memtype) namedprocess_namegroup_num_procs > 0
All other processes behave expectedly with RSS >= PSS. Why could this process be reporting this behaviour?
TIA!
Phantom
(1 rep)
Oct 15, 2024, 09:34 AM
0
votes
0
answers
33
views
Tools for examining usermode memory
What tools are available for examining detailed memory allocations in user mode processes? For example: - What are the flags at the virtual page level? - What are the physical addresses (and their flags) mapped to virtual pages? - List the pages that are swapped out (when the system is under memory...
What tools are available for examining detailed memory allocations in user mode processes? For example:
- What are the flags at the virtual page level?
- What are the physical addresses (and their flags) mapped to virtual pages?
- List the pages that are swapped out (when the system is under memory pressure)?
- Which pages are in a copy-on-write state vs. unallocated vs. allocated to a process?
- How do you test changes to the Linux Kernel memory manager?
In my experience, memory management (on live systems) is messy and nondeterministic. I’m curious if anyone knows about tools for testing/examining memory at a page level?
I feel like I’m missing something and there must be tools that help people with these sorts of questions.
I know the Kernel gives us a few windows into what it's doing:
-
/proc/self/maps
- /proc/self/pagemap
- /proc/iomem
- /proc/kpagecount
- /proc/kpageflags
... but I haven't found a tool that makes it easy to visualize this information.
Any feedback you can offer would be much appreciated and I sincerely appreciate your time.
Very Respectfully,
Mark Nelson
Mark Nelson
(1 rep)
Sep 6, 2024, 05:13 AM
1
votes
1
answers
269
views
Understanding memory limits in a systemd service: Are they per-process or combined?
I have a systemd service named vcoagent.service running on my Linux system, and I'm trying to understand how memory limits specified for the service (`Memory: 300.3M (limit: 500.0M`)) apply to the processes it manages. Here is the relevant output from systemctl status vcoagent.service: ``` ● vcoagen...
I have a systemd service named vcoagent.service running on my Linux system, and I'm trying to understand how memory limits specified for the service (
Memory: 300.3M (limit: 500.0M
)) apply to the processes it manages.
Here is the relevant output from systemctl status vcoagent.service:
● vcoagent.service - Observability Agent
Loaded: loaded (/etc/systemd/system/vcoagent.service; enabled; vendor preset: enabled)
Active: active (running) since Wed 2024-04-10 11:05:17 +07; 3h 38min ago
Main PID: 134018 (vcoagent)
Tasks: 59 (limit: 18691)
Memory: 300.3M (limit: 500.0M)
CGroup: /system.slice/vcoagent.service
├─134018 /root/vcoagent --opampServer=false --nodeExporter=true
├─134031 /root/vcoagent --opampServer=false --nodeExporter=true
├─134038 /proc/134031/fd/3
└─134050 /proc/134031/fd/8 --config /tmp/otelcol/otelcol-config.yaml
My question is: Does the memory limit (500.0M) apply individually to each process (134018
, 134031
, etc.) managed by the service, or is it a combined limit for all processes together?
For instance, if one process exceeds its individual memory limit but the total memory usage across all processes remains below 500.0M
, will the service be considered within its memory limits?
I would appreciate any clarification on how systemd memory limits are enforced within a service context. Thank you!
Ackerman Shadow
(11 rep)
Apr 11, 2024, 12:50 AM
• Last activity: Apr 11, 2024, 04:33 AM
5
votes
1
answers
5789
views
Is there a way to set a hard cap/limit on how much RAM Chrome can use?
I'm using Linux on my Steam Deck (SteamOS/Arch Linux). Is there a method to set a hard cap/limit on the maximum total RAM Chrome can use with command line arguments? (to 8 GB out of the device's max 16 GB) Chrome works just fine with tons of tabs open on my Windows laptop with 8 GB of RAM, and I can...
I'm using Linux on my Steam Deck (SteamOS/Arch Linux).
Is there a method to set a hard cap/limit on the maximum total RAM Chrome can use with command line arguments? (to 8 GB out of the device's max 16 GB)
Chrome works just fine with tons of tabs open on my Windows laptop with 8 GB of RAM, and I can always kill individual processes with Chrome's built-in task manager.
But on the Steam Deck's Arch Linux, if I forget to use Chrome's built-in task manager to kill tabs/processes, having too many tabs of certain poorly optimized websites can cause the entire machine to hang. Then I have no choice but to hard reset by holding the power button.
I already use uBlock Origin to clean up websites, but that doesn't seem to be enough.
--TL;DR--
I want to use command-line arguments to set a hard cap on the maximum total RAM that Chrome can use to 8 GB out of the device's 16 GB of RAM.
If I use the KDE Menu Editor, I can directly add command-line arguments to Chrome.
The command-line to execute is:
run --branch=stable --arch=x86_64 --command=/app/bin/chrome --file-forwarding com.google.Chrome @@u %U @@
Is there an argument I can add to do this? Looking for something like: "--max_ram_usage 8192MB"
JLHack7
(51 rep)
Jun 25, 2023, 09:11 PM
• Last activity: Jun 25, 2023, 09:51 PM
0
votes
1
answers
108
views
Memory total shown on Linux on 8GB memory PC is only 7038920 kB
Why linux /proc/meminfo show:" **1 MemTotal: 7038920 kB** " (proc most likely is to mean Kibibyte) in a PC of 8 GB memory RAM, although its Kibibyte is 7812500 ?
Why linux /proc/meminfo show:" **1 MemTotal: 7038920 kB** " (proc most likely is to mean Kibibyte) in a PC of 8 GB memory RAM, although its Kibibyte is 7812500 ?
user575072
Jun 8, 2023, 06:50 PM
• Last activity: Jun 8, 2023, 11:38 PM
0
votes
1
answers
102
views
Do page tables used to store kernel stack pointer when context switch happen in kernel mode of the process?
I have two questions; 1. Suppose a user space application/process is running in kernel mode. I understand if a context switch is happened now, the kernel stack pointer of that process is stored in the `task_struct`. Is that correct? For that, do a `PTE` (page table entry) created in the page table t...
I have two questions;
1. Suppose a user space application/process is running in kernel mode. I understand if a context switch is happened now, the kernel stack pointer of that process is stored in the
task_struct
. Is that correct? For that, do a PTE
(page table entry) created in the page table to map the Kernel Stack Pointer
(which is a Virtual address
) to the Physical Address
?
2. In the case of a kthread
, does it have a page table to support context switch?
Franc
(309 rep)
Apr 30, 2023, 05:29 AM
• Last activity: Apr 30, 2023, 10:45 AM
4
votes
1
answers
2610
views
How does mmio get routed to io devices?
I am trying to understand how IO devices are mapped into the 'regular' memory address space on modern x86 machines running Linux. Some details which I am trying to make sense of are: 1. `cat /proc/iomem` prints out a list of io memory mapped regions (printing the **physical** addresses) which are no...
I am trying to understand how IO devices are mapped into the 'regular' memory address space on modern x86 machines running Linux.
Some details which I am trying to make sense of are:
1.
cat /proc/iomem
prints out a list of io memory mapped regions (printing the **physical** addresses) which are non-contiguous
2. These regions can be requested by kernel modules dynamically at runtime, and allocated via the function request_mem_region which is defined in
3. x86 machines use mov
for both memory-access and IO (that is mapped into memory)
So now, supposing kernel module code is running, we will likely encounter an instruction like mov [value] [virtual address]
where the virtual address could be referring to either an mmio region or 'normal' data values that exist in memory. The process to separate mmio traffic from 'normal' memory ought to have 2 key steps:
1. determine **if** the address is mmio. The page table has a flag for whether it is memory-mapped, so I assume the mmu determines this while doing page table translation.
2. Determine the 'IO address' of the newly produced physical address (that the mmu gave as output from the page table translation) and pass this to whatever chip interfaces with the IO (Northbridge, root complex, etc)
**Question 1**: is my understanding of step-1 above correct?
**Question 2**: How is step-2 carried out? (by what **entity** and how is the map stored?)
The ranges that need to be checked are listed in /proc/iomem, and the data which it draws from I guess is a map that looks like:
map[mmio_address] = io_address_object
Keeping in mind that all of this is happening **within** the context of a single mov
instruction from the perspective of the cpu, I can't see how this translation could happen via anything other than hardware external to the cpu.
shafe
(200 rep)
Apr 2, 2023, 03:15 AM
• Last activity: Apr 2, 2023, 07:11 AM
1
votes
1
answers
368
views
On some UNIX implementations, it is not possible to call free() on a block of memory allocated via memalign()
I use Linux only but I want to understand what this means: From *the Linux Programming Interface*: > Blocks of memory allocated using `memalign()` or `posix_memalign()` > should be deallocated with `free()`. > > On some UNIX implementations, it is not possible to call `free()` on a > block of memory...
I use Linux only but I want to understand what this means:
From *the Linux Programming Interface*:
> Blocks of memory allocated using
memalign()
or posix_memalign()
> should be deallocated with free()
.
>
> On some UNIX implementations, it is not possible to call free()
on a
> block of memory allocated via memalign()
, because the memalign()
> implementation uses malloc()
to allocate a block of memory, and then
> returns a pointer to an address with a suitable alignment in that
> block. The glibc implementation of memalign()
doesn’t suffer this
> limitation.
From man memalign
:
> POSIX requires that memory obtained from posix_memalign()
can be
> freed using free(3)
. Some systems provide no way to reclaim memory
> allocated with
> memalign()
or valloc()
(because one can pass to free(3)
only a pointer obtained from malloc(3)
, while, for example,
> memalign()
would call malloc(3)
and
> then align the obtained value). The glibc implementation allows memory obtained from any of these functions to be reclaimed
> with free(3)
.
I don't know much about memory alignment and don't understand on "*for example, memalign()
would call malloc(3)
and then align the obtained value*".
Could someone tell me what's going on here and what could be wrong with free()
?
Rick
(1247 rep)
Aug 27, 2022, 03:18 AM
• Last activity: Aug 27, 2022, 05:33 AM
2
votes
1
answers
1526
views
Can a Linux Swap Partition Be Too Big?
Can a Linux swap partition be too big? I'm pretty certain the answer is, "no" but I haven't found any resources on-point, so thought I'd ask. In contrast, the main Windows swap file, pagefile.sys, can be too large. A commonly cited cap is 3x installed RAM, else the system may have trouble functionin...
Can a Linux swap partition be too big?
I'm pretty certain the answer is, "no" but I haven't found any resources on-point, so thought I'd ask.
In contrast, the main Windows swap file, pagefile.sys, can be too large. A commonly cited cap is 3x installed RAM, else the system may have trouble functioning.
The distinction seems to lie in the fact that Linux virtual memory is highly configurable with kernel parameters, not to mention compile options, whereas Windows virtual memory is barely so. Windows virtual memory management consequently seems to rely on algorithms that are immutable or seem to rely on swap file size and how it is configured.
Linux has its own virtual memory management algorithms, of course, but the question is whether and how they are affected by the size of the specified swap partition or file.
This comes up because I have a system with 16GB physical RAM configured with a series of 64GB partitions to facilitate a multi-boot capability. For convenience / laziness, I've simply designated one of these 64GB partitions as swap, *i.e.*, 4x physical RAM in contrast to Windows' 3x cap (the latter being relevant only as a frame of reference because this is a Linux-only system). I'm debugging some issues around memory management and VMware Workstation and have come to wonder what, if any, effect the swap partition's size has on compaction, swappiness, page faults, and performance generally.
Many thanks for any constructive input.
ebsf
(399 rep)
Aug 23, 2022, 07:31 PM
• Last activity: Aug 23, 2022, 08:31 PM
4
votes
1
answers
2453
views
How does the kernel address swapped memory pages on swap partition or swap file?
A swap partition doesn't contain a structured filesystem. The kernel doesn't need that because it stores memory pages on the partition marked as a swap area. Since there could be several memory pages in the swap area, how does the kernel locate each page when a process requests its page to be loaded...
A swap partition doesn't contain a structured filesystem. The kernel doesn't need that because it stores memory pages on the partition marked as a swap area. Since there could be several memory pages in the swap area, how does the kernel locate each page when a process requests its page to be loaded into memory? Let's explain more: Looking at the header of the swap partition from Devuan OS:
#define SWAP_UUID_LENGTH 16
#define SWAP_LABEL_LENGTH 16
struct swap_header_v1_2 {
char bootbits; /* Space for disklabel etc. */
unsigned int version;
unsigned int last_page;
unsigned int nr_badpages;
unsigned char uuid[SWAP_UUID_LENGTH];
char volume_name[SWAP_LABEL_LENGTH];
unsigned int padding;
unsigned int badpages;
};
So when
mkswap
command is executed for a partition, that's what gets placed on that partition, the swap header.
Now, let's have a scenario where "process A" has its memory page swapped, so there's one memory page in the swap area. Of course, there could be many memory pages in the swap area. "Process A" needs to access that memory page that was swapped. "Process A" tells the kernel, may I have my swapped memory page, please? The kernel says: sure, my dear friend. The kernel looks for "process A"'s memory page in the swap partition. Since the swap partition isn't a sophisticated structure (not a filesystem), how would the kernel know how to locate that specific memory page of "process A" in the swap partition?
Perhaps the kernel somewhere stores sector addresses for those swapped pages, so when a process asks for its memory page, the kernel knows where to look in the swap partition, reads the memory page from the partition and loads it into memory.
direprobs
(1064 rep)
Aug 20, 2017, 12:31 PM
• Last activity: Aug 17, 2022, 05:10 PM
21
votes
2
answers
176738
views
How to clear memory cache in Linux
![TOP][1] [1]: https://i.sstatic.net/auNr7.png Is there any command that by using I can clean the cache in RHEL? I used this command: sync; echo 3 > /proc/sys/vm/drop_caches but it didn't work.

OmiPenguin
(4398 rep)
Dec 15, 2012, 02:17 PM
• Last activity: Jul 31, 2022, 07:04 AM
0
votes
0
answers
260
views
How can i access the page table of a process from kernel using a custom syscall?
I am using Ubuntu 16.04, kernel: 4.17.4 I want to access the page table of a process. The idea is, inside a c code I will call a custom Syscall and the syscall will be able to access the page table of the process. How do I design the system call? I'd appreciate any example or related reading materia...
I am using Ubuntu 16.04, kernel: 4.17.4
I want to access the page table of a process. The idea is, inside a c code I will call a custom Syscall and the syscall will be able to access the page table of the process. How do I design the system call? I'd appreciate any example or related reading materials.
My job is to modify some page table entries (changing some virtual address to physical address mapping) of the page table for a project which I want to do using the syscall.
Misbah
(1 rep)
Jun 17, 2022, 08:00 PM
Showing page 1 of 20 total questions