Unix & Linux Stack Exchange

Q&A for users of Linux, FreeBSD and other Unix-like operating systems

Latest Questions

1 votes

0 answers

41 views

Too slow Tiered Memory Demotion and CPU Lock-up(maybe) with cgroup v2 memory.high

linux-kernel memory cgroups memory-management

We are currently testing tiered memory demotion on a machine equipped with a CXL device. To facilitate this, we created a specific script (https://github.com/hyun-sa/comem) and are using the memory.high setting within a cgroup to force memory demotion. These are the commands we used to enable demoti...

                                  We are currently testing tiered memory demotion on a machine equipped with a CXL device.

To facilitate this, we created a specific script (https://github.com/hyun-sa/comem)  and are using the memory.high setting within a cgroup to force memory demotion.

These are the commands we used to enable demotion:

    echo 1 > /sys/kernel/mm/numa/demotion_enabled
    echo 2 > /proc/sys/kernel/numa_balancing

The issue we're facing is that while demotion does occur, it proceeds extremely slowly—even slower than swapping to disk. Furthermore, during a 7-Zip benchmark, we observe a severe drop in CPU utilization, as if some process is causing a lock.

This is our running example (7zr b -md25 while memory is limited as 8G by memory.high)

    7-Zip (r) 23.01 (x64) : Igor Pavlov : Public domain : 2023-06-20
     64-bit locale=C.UTF-8 Threads:128 OPEN_MAX:1024
    
     d25
    Compiler: 13.2.0 GCC 13.2.0: SSE2
    Linux : 6.15.6 : #1 SMP PREEMPT_DYNAMIC Tue Jul 15 06:39:48 UTC 2025 : x86_64
    PageSize:4KB THP:madvise hwcap:2 hwcap2:2
    AMD EPYC 9554 64-Core Processor (A10F11) 
    
    1T CPU Freq (MHz):  3710  3731  3732  3733  3733  3732  3732
    64T CPU Freq (MHz): 6329% 3674   6006% 3495  
    
    RAM size:  386638 MB,  # CPU hardware threads: 128
    RAM usage:  28478 MB,  # Benchmark threads:    128
    
                           Compressing  |                  Decompressing
    Dict     Speed Usage    R/U Rating  |      Speed Usage    R/U Rating
             KiB/s     %   MIPS   MIPS  |      KiB/s     %   MIPS   MIPS
    
    22:     477942 10925   4256 464943  |    5843081 12451   4001 498193
    23:     337115  8816   3896 343480  |    5826376 12606   3999 504053
    24:       1785   108   1772   1919  |    5654618 12631   3928 496161
    25:        960    63   1739   1097  |    1767869  4606   3415 157287
    ----------------------------------  | ------------------------------
    Avr:    204451  4978   2916 202860  |    4772986 10573   3836 413924
    Tot:            7776   3376 308392
    execution_time(ms): 2807639

Is there a potential misunderstanding of how cgroups function or a misconfiguration in my setup that could be causing this behavior?

Our machine specifications are as follows:

Mainboard : Supermicro H13SSL-NT

CPU : Epyc 9554 (nps 1)

Dram : 128G

CXL device : SMART Modular Technologies Device c241

OS : Ubuntu 24.04 LTS

Kernel : Linux 6.15.6

    numactl -H
    available: 2 nodes (0-1)
    node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127
    node 0 size: 128640 MB
    node 0 free: 117909 MB
    node 1 cpus:
    node 1 size: 257998 MB
    node 1 free: 257840 MB
    node distances:
    node   0   1 
      0:  10  50 
      1:  255  10

Thank you for your help.
                                

Hyunsa (11 rep)

Jul 22, 2025, 05:43 AM • Last activity: Jul 22, 2025, 06:04 AM

2 votes

1 answers

44 views

Are the page tables of the process preempted swapped out if there is a dearth of memory for new process

memory swap virtual-memory memory-management

Suppose process A has been preempted to allow process B to run. If system memory is low and the kernel needs to reclaim memory for process B, is it possible for the page tables of process A to be swapped out to disk? I understand that when a page belonging to a process is swapped out, the correspond...

                                  Suppose process A has been preempted to allow process B to run. If system memory is low and the kernel needs to reclaim memory for process B, is it possible for the page tables of process A to be swapped out to disk?

I understand that when a page belonging to a process is swapped out, the corresponding page table entry (PTE) must be updated to indicate that the page has been swapped. But if the page tables of process A were already swapped out before its pages are selected for swapping, how does the kernel update the PTEs to reflect this?

In such a scenario, will the kernel swap the page tables of process A back into memory just to update the PTEs? Or is there some alternative mechanism used?

Just tried reading other sources from the internet, didn't find much

Padala Teja Sai Kumar Reddy (23 rep)

Jul 13, 2025, 01:12 PM • Last activity: Jul 13, 2025, 01:57 PM

10 votes

3 answers

1023 views

Why can ZONE_NORMAL only go up to 896MiB on 32-bit x86 processors?

linux-kernel virtual-memory memory-management

According to Linux Kernel Development by Robert Love, p. 233: > Because of hardware limitations, the kernel cannot treat all pages as identical. Some pages, because of their physical address in memory, cannot be used for certain tasks. Because of this limitation, the kernel divides pages into differ...

                                  According to Linux Kernel Development by Robert Love, p. 233:

> Because of hardware limitations, the kernel cannot treat all pages as identical. Some pages, because of their physical address in memory, cannot be used for certain tasks. Because of this limitation, the kernel divides pages into different *zones*. The kernel uses the zones to group pages of similar properties. In particular, Linux has to deal with two shortcomings of hardware with respect to memory addressing:
> * Some hardware devices can perform DMA (direct memory access) to only certain memory addresses.
> * **Some architectures can physically addressing [sic] larger amounts of memory than they can virtually address.** Consequently, some memory is not permanently mapped into the kernel address space.

The bold face is added by me. What does this statement mean? Love goes on to say:

> What an architecture can and cannot directly map varies. On 32-bit x86 systems, ZONE_HIGHMEM is all memory above the physical 896MB mark. On other architectures, ZONE_HIGHMEM is empty because all memory is directly mapped.

Could this mean that the MMU on 32-bit x86 can't map more than 896MiB or so of physical addresses at one time? In other words, the MMU can handle a page table with up to 896MiB/4KiB = 229,376 entries?

[This answer](https://unix.stackexchange.com/a/98229/703626)  says that Love is talking about Physical Address Extension (PAE). If my mental model of the whole situation is correct, it is saying that if the kernel were to ever need to take up more than 4GiB of physical memory on a system with PAE, then it would not be able to fit enough entries in the page table at once (because there's not enough virtual address space to address it at once), so it would need to bring pages in and out (hence "Dynamically mapped pages" in Love's table on p. 234). But then why is the mark so low at 896MiB? Why can't it be somewhere higher, like 4096-128MiB? Then, on any system where the physical address space *does* fit within the virtual address space (<4GiB), almost all of the memory would be ZONE_NORMAL, and could be mapped in the kernel's virtual address space without needing to dynamically map and unmap memory. This is the current situation on x86_64.

Andymang (103 rep)

Jul 3, 2025, 03:47 PM • Last activity: Jul 4, 2025, 06:44 PM

4 votes

1 answers

1130 views

Process memory layout - difference between heap, data and mmap areas

linux-kernel virtual-memory mmap memory-management

I see in the web many conflicting or unclear descriptions of the memory layout of a Linux process. Usually the [common diagram]( https://stackoverflow.com/q/64038876/8529284) looks like: [![enter image description here][1]][1] And a common [description](https://www.quora.com/Is-the-data-segment-is-p...

                                  I see in the web many conflicting or unclear descriptions of the memory layout of a Linux process. Usually the [common diagram](
https://stackoverflow.com/q/64038876/8529284)  looks like:



And a common [description](https://www.quora.com/Is-the-data-segment-is-part-of-the-heap-or-the-heap-is-part-of-it/answer/Sudarshan-43?ch=15&oid=30002660&share=af08bbcb&srid=2KkSm&target_type=answer)  would say that:

> The data segment contains only global or static variable which have a
> predefined value and can be modified. Heap contains the dynamically
> allocated data that is stored in a memory section we refer that as
> heap section and this section typically starts where data segments
> ends.

And [also](https://stackoverflow.com/a/14954147/8529284) :

> The heap is, generally speaking, one specific memory region created by
> the C runtime, and managed by malloc (which in turn uses the brk
> and sbrk system calls to grow and shrink).
> 
> mmap is a way of creating new memory regions, independently of
> malloc (and so independently of the heap).  munmap is simply its
> inverse, it releases these regions.

Many of the those explanations seem outdated, and I find many discrepancies. For instance, many articles - as the answer above - claim that the heap is used my malloc, but this is actualy a library call that's using either sbrk or mmap, as the malloc [man page](https://man7.org/linux/man-pages/man3/malloc.3.html)  says:

> Normally, malloc() allocates memory from the heap, and adjusts
>        the size of the heap as required, using sbrk(2).  When allocating
>        blocks of memory larger than **MMAP_THRESHOLD** bytes, the glibc
>        malloc() implementation allocates the memory as a private
>        anonymous mapping using mmap(2).

So if malloc in many cases in implemented by mmap, what's the difference between the heap and and the mmap area?

Another thing that seems like a contradiction is that many articles (as the malloc man page itself) claim that brk/sbrk adjust the size of the heap, but their [man page](https://man7.org/linux/man-pages/man2/brk.2.html)  says it actually adjust the size of the **data segment**:

> brk() and sbrk() change the location of the **program break**,
> which
>        defines the end of the process's data segment (i.e., the program
>        break is the first location after the end of the uninitialized
>        data segment).

So I'm trying to get a clear, up-to-date overall explanation of the memory layout of processes nowadays with the different segments, that also addresses those questions:

1. What is the difference between the heap and the mmap areas? (From some tests I was attempting, by looking at the addresses I got from mmap and comparing to the range of the heap in /proc/self/maps, it seems that some mmap allocated pages are actually allocated inside the heap segment.)
2. Does the **break** signifies the end of the **data segment**, or the end of the **heap**?

Other related questions:
* [how brk pointer grow after calling malloc](https://unix.stackexchange.com/q/610939/273579) 
* [When is the heap used for dynamic memory allocation?](https://unix.stackexchange.com/q/411408/273579) 
                                

aviro (6925 rep)

Feb 13, 2024, 01:07 PM • Last activity: Jun 9, 2025, 06:03 AM

0 votes

0 answers

27 views

Intel VM-exit EPT_VIOLATION error_code

memory-management virtual-machine qemu kvm

I'm using `trace-cmd` on a Linux host running a qemu/kvm VM in order to check `VM-exit` events related to `EPT_VIOLATION` reason. root@eve-ng62:~# trace-cmd record -b 20000 -e kvm:kvm_page_fault -e kvm_exit -P 628130 Hit Ctrl^C to stop recording root@eve-ng62:~# trace-cmd report | grep kvm_page_faul...

                                  I'm using trace-cmd on a Linux host running a qemu/kvm VM in order to check VM-exit events related to EPT_VIOLATION reason.

    root@eve-ng62:~# trace-cmd record -b 20000 -e kvm:kvm_page_fault -e kvm_exit -P 628130
    Hit Ctrl^C to stop recording
    
    root@eve-ng62:~# trace-cmd report | grep kvm_page_fault -B 1
           CPU 3/KVM-628130  1707156.150815: kvm_exit:             reason EPT_VIOLATION rip 0x7f152741da17 info 181 0
           CPU 3/KVM-628130  1707156.150816: kvm_page_fault:       vcpu 3 rip 0x7f152741da17 address 0x00000003c93f8e00 error_code 0x181
    root@eve-ng62:~#

The output show the faulting address 0x00000003c93f8e00  and the error_code 0x181.
I'm not sure whether that error_code is the one from the function https://github.com/torvalds/linux/blob/master/arch/x86/kvm/vmx/vmx.c#L5810 .

I also found this related thread here .
                                

CarloC (385 rep)

May 13, 2025, 07:38 AM • Last activity: May 15, 2025, 02:59 PM

0 votes

0 answers

110 views

Allocating contiguous physical memory using huge pages in kernel module

linux kernel-modules memory-management huge-pages

0 I need a kernel module that allocates 8MB of physically contiguous memory using 2MB huge pages, in response to a user-space mmap() request. While I’ve successfully used alloc_pages() with 4KB pages to allocate smaller contiguous chunks like 256KB or 512KB, I’m unsure if this approach can be used t...

                                  0

I need a kernel module that allocates 8MB of physically contiguous memory using 2MB huge pages, in response to a user-space mmap() request. While I’ve successfully used alloc_pages() with 4KB pages to allocate smaller contiguous chunks like 256KB or 512KB, I’m unsure if this approach can be used to allocate 8MB of physically contiguous memory backed by 2MB huge pages.

To consistently allocate 8MB using 2MB huge pages, is there a way to reserve these huge pages And in scenarios where sufficient 2MB pages aren't available, is there a fallback mechanism to allocate the same 8MB region using 4KB pages instead?

ReturnAddress (3 rep)

Apr 18, 2025, 04:23 PM

1 votes

1 answers

1824 views

Size of virtual memory in Linux

linux virtual-memory memory-management

On what basis the size of User and kernel virtual memory is decided in Linux? (32-bit, if that's relevant.) Is it configurable? If we have 512 MB RAM What will be the size of user and kernel virtual address?

                                  On what basis the size of User and kernel virtual memory is decided in Linux? (32-bit, if that's relevant.) Is it configurable?

If we have 512 MB RAM What will be the size of user and kernel virtual address?

Krishnamoorthi M (125 rep)

Feb 13, 2020, 05:53 AM • Last activity: Mar 25, 2025, 11:10 AM

0 votes

2 answers

309 views

Virtual Address Space

linux process virtual-memory memory-management

I have started to learn about Virtual Address Space (VAS) and I have few questions: 1. How much of VAS is created for each process depending on the architecture (32-bit and 64-bit)? 2. Is VAS for each process created on hard disk? If so, what happens if there is not enough space? 3. What are the con...

                                  I have started to learn about Virtual Address Space (VAS) and I have few questions:

How much of VAS is created for each process depending on the architecture (32-bit and 64-bit)?
Is VAS for each process created on hard disk? If so, what happens if there is not enough space?
What are the contents stored in VAS like text, data, BSS ?
                                

Vivek (101 rep)

Dec 22, 2020, 11:44 AM • Last activity: Mar 25, 2025, 08:43 AM

0 votes

0 answers

178 views

Resident memory reported significantly lesser than Proportional Resident memory - Process Exporter and Grafana

linux memory-management prometheus-exporter

I have a process monitoring stack setup with process exporter and Grafana with the "process profiling with treemap" dashboard, and I see some suspicious behaviour regarding the memory it is reporting. Based on my understanding (Having read [this](https://unix.stackexchange.com/questions/33381/getting-information-about-a-process-memory-usage-from-proc-pid-smaps) article and the links mentioned within that article): RSS = Private memory + shared memory PSS = Private memory + (shared memory / num of processes sharing said memory) This leads me to believe that RSS >= PSS at any given time. Here is what I observe: 1. Process takes 39.2GB of virtual memory [process alloted 39.2GB virtual memory](https://i.sstatic.net/zOFaMpz5.png) 2. Process takes up 38.1GB of "proportional resident memory", this makes sense [Process takes up 38.1GB proportional resident memory](https://i.sstatic.net/BC3QAWzu.png) 3. This is where this get suspicious, process takes up only 18.8GB of resident memory. [Process takes only 19.2GB resident memory](https://i.sstatic.net/ZlsbQymS.png) Is my understanding of how RSS and PSS works correct? If yes, what could be the reasons this process is behaving like this(or being reported as such). I suspected process exporter or grafana might be incorrect but no other process reports something suspicious like this so im assuming they're working as expected. I looked at the process exporter github to confirm if my understanding of the fields reported by it is correct.

resident: Field rss(24) from /proc/[pid]/stat, whose doc says:

This is just the pages which count toward text, data, or stack space. This does not include pages which have not been demand-loaded in, or which are swapped out.


proportionalResident: Sum of "Pss" fields from /proc/[pid]/smaps, whose doc says:

The "proportional set size" (PSS) of a process is the count of pages it has in memory, where each page is divided by the number of processes sharing it.

No pages have been swapped out. Here are the queries used by Grafana to graph these: proportional resident memory: namedprocess_namegroup_memory_bytes{instance=~"$instance", memtype="proportionalResident"} > 0 virtual memory: namedprocess_namegroup_memory_bytes{instance=~"$instance", memtype="virtual"} resident memory:

namedprocess_namegroup_memory_bytes{instance=~"$instance", memtype="resident"} / ignoring(memtype) namedprocess_namegroup_num_procs > 0

All other processes behave expectedly with RSS >= PSS. Why could this process be reporting this behaviour? TIA!

Phantom (1 rep)

Oct 15, 2024, 09:34 AM

0 votes

0 answers

33 views

Tools for examining usermode memory

memory virtual-memory utilities memory-management

What tools are available for examining detailed memory allocations in user mode processes? For example: - What are the flags at the virtual page level? - What are the physical addresses (and their flags) mapped to virtual pages? - List the pages that are swapped out (when the system is under memory...

                                  What tools are available for examining detailed memory allocations in user mode processes?  For example:
  - What are the flags at the virtual page level?
  - What are the physical addresses (and their flags) mapped to virtual pages?
  - List the pages that are swapped out (when the system is under memory pressure)?
  - Which pages are in a copy-on-write state vs. unallocated vs. allocated to a process?
  - How do you test changes to the Linux Kernel memory manager?

In my experience, memory management (on live systems) is messy and nondeterministic.  I’m curious if anyone knows about tools for testing/examining memory at a page level?

I feel like I’m missing something and there must be tools that help people with these sorts of questions.

I know the Kernel gives us a few windows into what it's doing:
  - /proc/self/maps
  - /proc/self/pagemap
  - /proc/iomem
  - /proc/kpagecount
  - /proc/kpageflags

... but I haven't found a tool that makes it easy to visualize this information.

Any feedback you can offer would be much appreciated and I sincerely appreciate your time.

Very Respectfully,

Mark Nelson

                                

Mark Nelson (1 rep)

Sep 6, 2024, 05:13 AM

1 votes

1 answers

269 views

Understanding memory limits in a systemd service: Are they per-process or combined?

linux services process-management cgroups memory-management

I have a systemd service named vcoagent.service running on my Linux system, and I'm trying to understand how memory limits specified for the service (`Memory: 300.3M (limit: 500.0M`)) apply to the processes it manages. Here is the relevant output from systemctl status vcoagent.service: ``` ● vcoagen...

I have a systemd service named vcoagent.service running on my Linux system, and I'm trying to understand how memory limits specified for the service (Memory: 300.3M (limit: 500.0M)) apply to the processes it manages. Here is the relevant output from systemctl status vcoagent.service:

● vcoagent.service - Observability Agent
     Loaded: loaded (/etc/systemd/system/vcoagent.service; enabled; vendor preset: enabled)
     Active: active (running) since Wed 2024-04-10 11:05:17 +07; 3h 38min ago
   Main PID: 134018 (vcoagent)
      Tasks: 59 (limit: 18691)
     Memory: 300.3M (limit: 500.0M)
     CGroup: /system.slice/vcoagent.service
             ├─134018 /root/vcoagent --opampServer=false --nodeExporter=true
             ├─134031 /root/vcoagent --opampServer=false --nodeExporter=true
             ├─134038 /proc/134031/fd/3
             └─134050 /proc/134031/fd/8 --config /tmp/otelcol/otelcol-config.yaml

My question is: Does the memory limit (500.0M) apply individually to each process (134018, 134031, etc.) managed by the service, or is it a combined limit for all processes together? For instance, if one process exceeds its individual memory limit but the total memory usage across all processes remains below 500.0M, will the service be considered within its memory limits? I would appreciate any clarification on how systemd memory limits are enforced within a service context. Thank you!

Ackerman Shadow (11 rep)

Apr 11, 2024, 12:50 AM • Last activity: Apr 11, 2024, 04:33 AM

5 votes

1 answers

5789 views

Is there a way to set a hard cap/limit on how much RAM Chrome can use?

memory chrome memory-management

I'm using Linux on my Steam Deck (SteamOS/Arch Linux). Is there a method to set a hard cap/limit on the maximum total RAM Chrome can use with command line arguments? (to 8 GB out of the device's max 16 GB) Chrome works just fine with tons of tabs open on my Windows laptop with 8 GB of RAM, and I can...

                                  I'm using Linux on my Steam Deck (SteamOS/Arch Linux).

Is there a method to set a hard cap/limit on the maximum total RAM Chrome can use with command line arguments? (to 8 GB out of the device's max 16 GB)

Chrome works just fine with tons of tabs open on my Windows laptop with 8 GB of RAM, and I can always kill individual processes with Chrome's built-in task manager.

But on the Steam Deck's Arch Linux, if I forget to use Chrome's built-in task manager to kill tabs/processes, having too many tabs of certain poorly optimized websites can cause the entire machine to hang. Then I have no choice but to hard reset by holding the power button.

I already use uBlock Origin to clean up websites, but that doesn't seem to be enough.

--TL;DR--

I want to use command-line arguments to set a hard cap on the maximum total RAM that Chrome can use to 8 GB out of the device's 16 GB of RAM.

If I use the KDE Menu Editor, I can directly add command-line arguments to Chrome.

The command-line to execute is:

    run --branch=stable --arch=x86_64 --command=/app/bin/chrome --file-forwarding com.google.Chrome @@u %U @@

Is there an argument I can add to do this? Looking for something like: "--max_ram_usage 8192MB"

JLHack7 (51 rep)

Jun 25, 2023, 09:11 PM • Last activity: Jun 25, 2023, 09:51 PM

0 votes

1 answers

108 views

Memory total shown on Linux on 8GB memory PC is only 7038920 kB

proc memory-management

Why linux /proc/meminfo show:" **1 MemTotal: 7038920 kB** " (proc most likely is to mean Kibibyte) in a PC of 8 GB memory RAM, although its Kibibyte is 7812500 ?

                                  Why linux /proc/meminfo show:" **1 MemTotal: 7038920 kB** " (proc most likely is to mean Kibibyte) in a PC of 8 GB memory RAM, although its Kibibyte is 7812500 ?
                                

user575072

Jun 8, 2023, 06:50 PM • Last activity: Jun 8, 2023, 11:38 PM

0 votes

1 answers

102 views

Do page tables used to store kernel stack pointer when context switch happen in kernel mode of the process?

linux-kernel virtual-memory memory-management

I have two questions; 1. Suppose a user space application/process is running in kernel mode. I understand if a context switch is happened now, the kernel stack pointer of that process is stored in the `task_struct`. Is that correct? For that, do a `PTE` (page table entry) created in the page table t...

                                  I have two questions;
 1. Suppose a user space application/process is running in kernel mode. I understand if a context switch is happened now, the kernel stack pointer of that process is stored in the task_struct. Is that correct? For that, do a PTE (page table entry) created in the page table to map the Kernel Stack Pointer (which is a Virtual address) to the Physical Address?
2. In the case of a kthread, does it have a page table to support context switch?
                                

Franc (309 rep)

Apr 30, 2023, 05:29 AM • Last activity: Apr 30, 2023, 10:45 AM

4 votes

1 answers

2610 views

How does mmio get routed to io devices?

io x86 memory-management iommu

I am trying to understand how IO devices are mapped into the 'regular' memory address space on modern x86 machines running Linux. Some details which I am trying to make sense of are: 1. `cat /proc/iomem` prints out a list of io memory mapped regions (printing the **physical** addresses) which are no...

                                  I am trying to understand how IO devices are mapped into the 'regular' memory address space on modern x86 machines running Linux.

Some details which I am trying to make sense of are:

1. cat /proc/iomem prints out a list of io memory mapped regions (printing the **physical** addresses) which are non-contiguous

2. These regions can be requested by kernel modules dynamically at runtime, and allocated via the function request_mem_region which is defined in 

3. x86 machines use mov for both memory-access and IO (that is mapped into memory)

So now, supposing kernel module code is running, we will likely encounter an instruction like mov [value] [virtual address] where the virtual address could be referring to either an mmio region or 'normal' data values that exist in memory.  The process to separate mmio traffic from 'normal' memory ought to have 2 key steps:
1. determine **if** the address is mmio. The page table has a flag for whether it is memory-mapped, so I assume the mmu determines this while doing page table translation.
2. Determine the 'IO address' of the newly produced physical address (that the mmu gave as output from the page table translation) and pass this to whatever chip interfaces with the IO (Northbridge, root complex, etc)

**Question 1**: is my understanding of step-1 above correct?

**Question 2**: How is step-2 carried out?  (by what **entity** and how is the map stored?)

The ranges that need to be checked are listed in /proc/iomem, and the data which it draws from I guess is a map that looks like:
map[mmio_address] = io_address_object

Keeping in mind that all of this is happening **within** the context of a single mov instruction from the perspective of the cpu, I can't see how this translation could happen via anything other than hardware external to the cpu.

shafe (200 rep)

Apr 2, 2023, 03:15 AM • Last activity: Apr 2, 2023, 07:11 AM

1 votes

1 answers

368 views

On some UNIX implementations, it is not possible to call free() on a block of memory allocated via memalign()

linux posix memory-management

I use Linux only but I want to understand what this means: From *the Linux Programming Interface*: > Blocks of memory allocated using `memalign()` or `posix_memalign()` > should be deallocated with `free()`. > > On some UNIX implementations, it is not possible to call `free()` on a > block of memory...

                                  I use Linux only but I want to understand what this means:

From *the Linux Programming Interface*:

> Blocks of memory allocated using memalign() or posix_memalign()
> should be deallocated with free().
> 
> On some UNIX implementations, it is not possible to call free() on a
> block of memory allocated via memalign(), because the memalign()
> implementation uses malloc() to allocate a block of memory, and then
> returns a pointer to an address with a suitable alignment in that
> block. The glibc implementation of memalign() doesn’t suffer this
> limitation.

From man memalign:

> POSIX  requires  that  memory  obtained from posix_memalign() can be
> freed using free(3).  Some systems provide no way to reclaim memory
> allocated with
>        memalign() or valloc() (because one can pass to free(3) only a pointer obtained from malloc(3), while, for example,
> memalign() would call malloc(3) and
>        then align the obtained value).  The glibc implementation allows memory obtained from any of these functions to be reclaimed
> with free(3).

I don't know much about memory alignment and don't understand on "*for example, memalign() would call malloc(3) and then align the obtained value*". 

Could someone tell me what's going on here and what could be wrong with free()?


                                

Rick (1247 rep)

Aug 27, 2022, 03:18 AM • Last activity: Aug 27, 2022, 05:33 AM

2 votes

1 answers

1526 views

Can a Linux Swap Partition Be Too Big?

swap virtual-memory memory-management iommu huge-pages

Can a Linux swap partition be too big? I'm pretty certain the answer is, "no" but I haven't found any resources on-point, so thought I'd ask. In contrast, the main Windows swap file, pagefile.sys, can be too large. A commonly cited cap is 3x installed RAM, else the system may have trouble functionin...

                                  Can a Linux swap partition be too big?

I'm pretty certain the answer is, "no" but I haven't found any resources on-point, so thought I'd ask.

In contrast, the main Windows swap file, pagefile.sys, can be too large.  A commonly cited cap is 3x installed RAM, else the system may have trouble functioning.

The distinction seems to lie in the fact that Linux virtual memory is highly configurable with kernel parameters, not to mention compile options, whereas Windows virtual memory is barely so.  Windows virtual memory management consequently seems to rely on algorithms that are immutable or seem to rely on swap file size and how it is configured.

Linux has its own virtual memory management algorithms, of course, but the question is whether and how they are affected by the size of the specified swap partition or file.

This comes up because I have a system with 16GB physical RAM configured with a series of 64GB partitions to facilitate a multi-boot capability.  For convenience / laziness, I've simply designated one of these 64GB partitions as swap, *i.e.*, 4x physical RAM in contrast to Windows' 3x cap (the latter being relevant only as a frame of reference because this is a Linux-only system).  I'm debugging some issues around memory management and VMware Workstation and have come to wonder what, if any, effect the swap partition's size has on compaction, swappiness, page faults, and performance generally.

Many thanks for any constructive input.

ebsf (399 rep)

Aug 23, 2022, 07:31 PM • Last activity: Aug 23, 2022, 08:31 PM

4 votes

1 answers

2453 views

How does the kernel address swapped memory pages on swap partition or swap file?

kernel linux-kernel swap virtual-memory memory-management

A swap partition doesn't contain a structured filesystem. The kernel doesn't need that because it stores memory pages on the partition marked as a swap area. Since there could be several memory pages in the swap area, how does the kernel locate each page when a process requests its page to be loaded...

                                  A swap partition doesn't contain a structured filesystem. The kernel doesn't need that because it stores memory pages on the partition marked as a swap area. Since there could be several memory pages in the swap area, how does the kernel locate each page when a process requests its page to be loaded into memory? Let's explain more: Looking at the header of the swap partition from Devuan OS:

    #define SWAP_UUID_LENGTH 16
    #define SWAP_LABEL_LENGTH 16
    struct swap_header_v1_2 {
    	char	      bootbits;    /* Space for disklabel etc. */
    	unsigned int  version;
    	unsigned int  last_page;
    	unsigned int  nr_badpages;
    	unsigned char uuid[SWAP_UUID_LENGTH];
    	char	      volume_name[SWAP_LABEL_LENGTH];
    	unsigned int  padding;
    	unsigned int  badpages;
    };
So when mkswap command is executed for a partition, that's what gets placed on that partition, the swap header. 

Now, let's have a scenario where "process A" has its memory page swapped, so there's one memory page in the swap area. Of course, there could be many memory pages in the swap area. "Process A" needs to access that memory page that was swapped. "Process A" tells the kernel, may I have my swapped memory page, please? The kernel says: sure, my dear friend. The kernel looks for "process A"'s memory page in the swap partition. Since the swap partition isn't a sophisticated structure (not a filesystem), how would the kernel know how to locate that specific memory page of "process A" in the swap partition?

Perhaps the kernel somewhere stores sector addresses for those swapped pages, so when a process asks for its memory page, the kernel knows where to look in the swap partition, reads the memory page from the partition and loads it into memory.
                                

direprobs (1064 rep)

Aug 20, 2017, 12:31 PM • Last activity: Aug 17, 2022, 05:10 PM

21 votes

2 answers

176738 views

How to clear memory cache in Linux

rhel cache memory-management

![TOP][1] [1]: https://i.sstatic.net/auNr7.png Is there any command that by using I can clean the cache in RHEL? I used this command: sync; echo 3 > /proc/sys/vm/drop_caches but it didn't work.

                                  
Is there any command that by using I can clean the cache in RHEL?

I used this command:

    sync; echo 3 > /proc/sys/vm/drop_caches

but it didn't work.

OmiPenguin (4398 rep)

Dec 15, 2012, 02:17 PM • Last activity: Jul 31, 2022, 07:04 AM

0 votes

0 answers

260 views

How can i access the page table of a process from kernel using a custom syscall?

linux-kernel memory-management

I am using Ubuntu 16.04, kernel: 4.17.4 I want to access the page table of a process. The idea is, inside a c code I will call a custom Syscall and the syscall will be able to access the page table of the process. How do I design the system call? I'd appreciate any example or related reading materia...

                                  
I am using Ubuntu 16.04, kernel: 4.17.4

I want to access the page table of a process. The idea is, inside a c code I will call a custom Syscall and the syscall will be able to access the page table of the process. How do I design the system call? I'd appreciate any example or related reading materials.

My job is to modify some page table entries (changing some virtual address to physical address mapping) of the page table for a project which I want to do using the syscall.

Misbah (1 rep)

Jun 17, 2022, 08:00 PM

Showing page 1 of 20 total questions