Unix & Linux Stack Exchange

Q&A for users of Linux, FreeBSD and other Unix-like operating systems

Latest Questions

4 votes

0 answers

2462 views

Recovering an overwritten file on ZFS

data-recovery zfs deleted-files copy-on-write

Is there any way to recover a deleted or overwritten file on ZFS? I accidentally overwrote a JPG file with a scanned image. Unfortunately, I didn’t take a snapshot beforehand. However, since ZFS uses the Copy-on-Write (CoW) mechanism, I believe the overwritten data might still exist in some form. Do...

                                  Is there any way to recover a deleted or overwritten file on ZFS?  

I accidentally overwrote a JPG file with a scanned image. Unfortunately, I didn’t take a snapshot beforehand. However, since ZFS uses the Copy-on-Write (CoW) mechanism, I believe the overwritten data might still exist in some form.  

Does anyone know if there is a way to restore the overwritten file on ZFS?  

I tried using photorec. As a result, I recovered some JPG files. However, my target file was not there. Strangely, photorec couldn't recover well almost JPG files which is not deleted. And I remembered that unfortunately my pool had a setting for lz4 compression.

kou (41 rep)

Sep 25, 2020, 08:05 AM • Last activity: Mar 26, 2025, 08:01 PM

0 votes

1 answers

715 views

can two running processes share the complete process image in physical memory, not just part of it?

process memory process-management fork copy-on-write

can two running processes share the complete process image in physical memory, not just part of it? Here I am talking about the Linux operating systems(eg. Ubuntu). **My thinking:** I think it is **False** in general because the only time it is possible is with copy-on-write during fork() and before...

                                  can two running processes share the complete process image in physical memory, not just part of it?

Here I  am talking about the Linux operating systems(eg. Ubuntu).

**My thinking:**
I think it is  **False** in general because the only time it is possible is with copy-on-write during fork() and before any writes have been made.

**Que:** Can someone explain to me am I correct or not?
If I am wrong please give me some examples

Deepesh Meena (101 rep)

Sep 8, 2018, 08:22 PM • Last activity: Jan 22, 2025, 08:00 AM

10 votes

2 answers

6745 views

How does ZFS copy on write work for large files

cp file-copy zfs copy-on-write

Let's say I have a large file (8GB) called `example.log` on ZFS. I do `cp example.log example.bak` to make a copy. Then I add or modify a few bytes in original file. What will happen? Will ZFS copy the entire 8GB file or only the blocks that changed (and all the chain of inodes pointing from the fil...

                                  Let's say I have a large file (8GB) called example.log on ZFS.
I do cp example.log example.bak to make a copy. Then I add or modify a few bytes in original file. What will happen?

Will ZFS copy the entire 8GB file or only the blocks that changed (and all the chain of inodes pointing from the file descriptor to that block)?
                                

HubertNNN (203 rep)

Jan 17, 2020, 01:21 PM • Last activity: Aug 14, 2024, 05:21 PM

22 votes

1 answers

16815 views

Why does "cp -R --reflink=always" perform a standard copy on a Btrfs filesystem?

btrfs cp copy-on-write reflink

Btrfs support Copy-On-Write. I tried to use that feature to clone a directory: cp -R --reflink=always foo_directory foo_directory.mirror I expected the command to finish almost instantly (like a `btrfs subvolume snapshot`), but the `cp` command seems to perform a slow, standard copy. According to th...

                                  Btrfs support Copy-On-Write. I tried to use that feature to clone a directory:

    cp -R --reflink=always foo_directory foo_directory.mirror

I expected the command to finish almost instantly (like a btrfs subvolume snapshot), but the cp command seems to perform a slow, standard copy.

According to the man page, I would expected --reflink=always to enforce Copy-On-Write:

> When --reflink[=always] is specified, perform a lightweight copy, where the data blocks are copied only when modified. If this is not possible the copy fails, or if --reflink=auto is specified, fall back to  a standard copy.

**Questions:**

- Do you know why --reflink=always doesn't work?
- What options (or other commands) should I use instead?

Philipp Claßen (4967 rep)

Jul 30, 2015, 03:30 PM • Last activity: Jun 23, 2023, 08:29 AM

2 votes

0 answers

593 views

Btrfs : compress and nodatacow priority + automation

linux arch-linux fedora btrfs copy-on-write

I have a btrfs partition mounted on `/` with compression enabled: ``` mount -o subvol=@,defaults,noatime,nodiratime,compress=zstd,space_cache=v2 /dev/mapper/archlinux /mnt ``` I want to disable the CoW mecanism on some folders such as: - The folder which contains my VM disks - Any folder that might...

I have a btrfs partition mounted on / with compression enabled:

mount -o subvol=@,defaults,noatime,nodiratime,compress=zstd,space_cache=v2 /dev/mapper/archlinux /mnt

I want to disable the CoW mecanism on some folders such as: - The folder which contains my VM disks - Any folder that might contain SQLite databases (mostly for browsers) Here's what [btrfs documentation](https://btrfs.readthedocs.io/en/latest/Administration.html#mount-options) states : > If compression is enabled, nodatacow and nodatasum are disabled. But it also states > If nodatacow or nodatasum are enabled, compression is disabled. I'm ok with the fact that they are mutually exclusive, however I do wonder which one will take priority if I set chattr +C to a folder. Will my change remain permanent ? Or will it be overriden during the next boot when my partition is remounted with the compress option ? My guts tell me that chattr +C will take precedence, but I've been unable to find any documentation that would confirm that. Second question is: is there a way to automatically disable CoW on all newly created SQLite databases ? At first I wanted to write a systemd timer which would scan the whole file system to look for SQLite databases and set chattr +C on it. But then in the [chattr man page](https://man7.org/linux/man-pages/man1/chattr.1.html) I read the following: > If it is set on a file which already has data blocks, it is undefined when the blocks assigned to the file will be fully stable. So basically I should not use chattr +C on existing non-empty files. Is there a way to hook into the filesystem so that any file that ends with \.(sqlite|db) or has the SQLite magic bytes won't use CoW ? Or do I have no other choice but to manually create the NoCoW folder before I install the targetted application ? I don't necessarily know if X or Y app is using an SQLite backend, it would be very inconvenient to remove all the app data to set NoCoW every time I realise an app is using SQLite. What would you recommend to somehow automate this process ? EDIT: Do you think it should be distributions maintainers responsibility to set chattr +C during installation of packages that use SQLite ?

ShellCode (235 rep)

Mar 2, 2023, 12:43 PM • Last activity: Mar 4, 2023, 12:50 PM

4 votes

2 answers

1266 views

Will writing identical data to blocks of a file under ZFS use space in snapshots?

zfs snapshot copy-on-write

I have a 16M file. I take a snapshot of the ZFS filesystem which contains it. If I overwrite the file with the same data, will ZFS need to store two copies of all of the blocks of the file?

                                  I have a 16M file.

I take a snapshot of the ZFS filesystem which contains it.

If I overwrite the file with the same data, will ZFS need to store two copies of all of the blocks of the file?

fadedbee (1113 rep)

Feb 28, 2018, 01:53 PM • Last activity: Jan 17, 2023, 10:05 PM

5 votes

4 answers

4765 views

How to check disk usage for folders containing reflinked files on XFS?

disk-usage btrfs xfs copy-on-write

XFS supports copy on write (CoW), so it is not entirely clear what `du` will say if some of the bytes are shared across files. I'd like to find a way to check how much disk space a folder uses, not counting shared bytes multiple times, i.e. the real usage on disk. Neither `xfs_estimate` nor `du` see...

                                  XFS supports copy on write (CoW), so it is not entirely clear what du will say if some of the bytes are shared across files. I'd like to find a way to check how much disk space a folder uses, not counting shared bytes multiple times, i.e. the real usage on disk.

Neither xfs_estimate nor du seem to do what I need:

    $ mkdir testfolder
    $ cd testfolder 
    $ dd if=/dev/zero of=testfile  bs=1M count=500 status=progress
    500+0 records in
    500+0 records out
    524288000 bytes (524 MB, 500 MiB) copied, 0,158889 s, 3,3 GB/s
    $ cp --reflink=always testfile testfile2                      
    $ xfs_estimate .                         
    . will take about 1004,4 megabytes
    $ du -hs .      
    1000M	.

What I expect is that some tool says that this folder uses only 500MB.

df shows that free disk spaces is reduced by 500MB when using a plain cp, but not at all when doing a cp --reflink=always. So reflinking seems to work, but df is not helpful in practice, because the disk is huge and I want to check the real size of a quite small folder.

I assume this might be a valid question for BTRFS too. But in my case, I need a solution which works for XFS.
                                

lumbric (449 rep)

Jan 4, 2021, 02:31 PM • Last activity: Jun 26, 2022, 07:29 PM

3 votes

1 answers

1196 views

Copy on write for directories?

filesystems copy-on-write

Some file-systems notably XFS and btrfs support Copy on Write at block level for files. This is done by reflinking where the underlying blocks are shared between files until they are modified. Since a directory is essentially [an associative array mapping file names to inodes](https://unix.stackexch...

                                  Some file-systems notably XFS and btrfs support Copy on Write at block level for files.
This is done by reflinking where the underlying blocks are shared between files until they are modified.

Since a directory is essentially [an associative array mapping file names to inodes](https://unix.stackexchange.com/questions/18605/how-are-directories-implemented-in-unix-filesystems)  it should be straight forward to do
something similar for directories.

Have any filesystems been developed which can support this on Linux (or any other Unix-like system)?

Presumably it would need kernel support just like use reflinking does.
That is a call like copy_file_range () which works with directories.

Is anyone actively working on this?
Is it simply that no-one has wanted to do it yet or is there any reason why this is a bad idea or unecessary?

Are there any particular technical obstacles that need to be overcome?

See also https://serverfault.com/questions/129969/is-there-a-way-to-create-a-copy-on-write-copy-of-a-directory 
which does not really answer this question.
                                

Bruce Adams (682 rep)

May 14, 2022, 08:15 PM • Last activity: May 16, 2022, 09:06 AM

1 votes

0 answers

48 views

Is the stack of a forked process shared with its parent?

linux process virtual-memory fork copy-on-write

Is the stack of a forked process shared with its parent? If so, does this happen via shared copy-on-write pages?

                                  Is the stack of a forked process shared with its parent?

If so, does this happen via shared copy-on-write pages?

HappyFace (1694 rep)

Apr 24, 2022, 07:23 PM

29 votes

4 answers

20345 views

Does any file system implement Copy on Write mechanism for CP?

linux filesystems ext4 xfs copy-on-write

We have seen OS doing Copy on Write optimisation when forking a process. Reason being that most of the time fork is preceded by exec, so we don't want to incur the cost of page allocations and copying the data from the caller address space unnecessarily. So does this also happen when doing CP on a l...

                                  We have seen OS doing Copy on Write optimisation when forking a process. Reason being that most of the time fork is preceded by exec, so we don't want to incur the cost of page allocations and copying the data from the caller address space unnecessarily.

So does this also happen when doing CP on a linux with ext4 or xfs (journaling) file systems? If it does not happen, then why not?

Mridul Verma (393 rep)

Sep 20, 2017, 01:56 AM • Last activity: Feb 26, 2021, 10:31 AM

6 votes

1 answers

4084 views

In what ways is the COW filesystem an improvement over the Journaling Filesystem?

linux filesystems journaling copy-on-write

I don't think an informative answer exists on u&l or otherwise, which mentions why `COW` filesystems are a leg-up over any of the three modes of `journaling`. How does the former provide both superior safety and performance while the latter provides one at the cost of the other?

                                  I don't think an informative answer exists on u&l or otherwise, which mentions why COW filesystems are a leg-up over any of the three modes of journaling. How does the former provide both superior safety and performance while the latter provides one at the cost of the other?
                                

computronium (878 rep)

Feb 12, 2021, 07:36 AM • Last activity: Feb 12, 2021, 08:12 AM

-2 votes

1 answers

86 views

snapshot management tool implementation

filesystems snapshot copy-on-write

I want to implement a command-line tool that allows me to take snapshots of the filesystem for linux/unix systems. I know there are various ways to implement snapshot Copy-On-Write Redirect-On-Write Log File architecture Split mirror I think these are by which various file systems allow snapshots. I...

                                  I want to implement a command-line tool that allows me to take snapshots of the filesystem for linux/unix systems.

I know there are various ways to implement snapshot
	Copy-On-Write
	Redirect-On-Write
	Log File architecture
	Split mirror

I think these are by which various file systems allow snapshots. I don’t know how to implement a tool that interacts with the file systems of a device to create and manage snapshots.

I found this tool http://snapper.io/documentation.html  but couldn't understand how this works. It would also be helpful if someone could share references to relevant literature.

MVJ (1 rep)

Oct 24, 2020, 05:43 AM • Last activity: Oct 24, 2020, 11:23 AM

1 votes

1 answers

1100 views

How to disable BTRFS copy-on-write updates for a subvolume using btrfs-property instead of chattr

btrfs copy-on-write

I want to disable BTRFS copy-on-write updates for a subvolume using the newer method `btrfs property` instead of the old method `chattr`. I found the man page here: [Manpage/btrfs-property - btrfs Wiki][1] This quote leads me to believe it has the functionality I desire: > btrfs property provides an...

                                  I want to disable BTRFS copy-on-write updates for a subvolume using the newer method btrfs property instead of the old method chattr.

I found the man page here:
Manpage/btrfs-property - btrfs Wiki 

This quote leads me to believe it has the functionality I desire:

> btrfs property provides an unified and user-friendly method to tune different btrfs properties instead of using the traditional method like chattr(1) or lsattr(1)

However, I need an example that will replicate this command:

    chattr +C /path/to/my/subvolume/.cache

where .cache is a BTRFS subvolme.

MountainX (18888 rep)

Oct 1, 2020, 01:41 AM • Last activity: Oct 10, 2020, 01:51 PM

1 votes

1 answers

476 views

A good way how to backup user data in CentOS?

backup disk software-rec restore copy-on-write

I need to back up my data and I have not found a good way so far. Just say I have 1 TB of a non-system disk with 50-100 GB of user data (binary files, source code, images, etc.). And another big disk, where I could save backups. I could use `rsync` or just `cp`, but I think it is not what I want. I...

                                  I need to back up my data and I have not found a good way so far.

Just say I have 1 TB of a non-system disk with 50-100 GB of user data (binary files, source code, images, etc.). And another big disk, where I could save backups. I could use rsync or just cp, but I think it is not what I want. 

I want an incremental backup. Restore a file/folder/whole drive from some point in time. Load a backup from some point in time to another disk (copy or just open read-only). See changes between backups and add an optional comment would be nice. Does anybody know a good cli backup tool? Maybe some snapshots tools? Or a git? But git for 50 GB of user data; isn't it nonsense? :D

Radek Uhlíř (39 rep)

Jan 16, 2020, 01:03 PM • Last activity: Jan 16, 2020, 01:39 PM

4 votes

2 answers

4674 views

BTRFS inside a KVM-VM on a qcow2 formatted image

ubuntu kvm qemu btrfs copy-on-write

I have a Ubuntu 14.04 application appliance that makes heavy use of BTRFS snapshots. The application is meant to be hypervisor agnostic and the snapshots need to be stored with the virtual machine image in case we need to troubleshoot an issue. Using the hypervisors built in snapshotting methods ins...

                                  I have a Ubuntu 14.04 application appliance that makes heavy use of BTRFS snapshots. The application is meant to be hypervisor agnostic and the snapshots need to be stored with the virtual machine image in case we need to troubleshoot an issue. Using the hypervisors built in snapshotting methods instead of BTRFS snapshotting would be more work than it's worth since it would require API access to the hypervisor from the VM which we don't want for security reasons. I can also remotely access the BTRFS snapshot filesystem subvolumes directly from the appliances command line via ssh without having to power down the machine or access a hypervisor API.

In the past I've only had to deal with deploying this application to vmware based hypervisors. I always used thin provisioning on my vmware disks and never noticed any performance issues. I use thin provisioning because I do a large amount of testing on this appliance and I tend to deploy many appliances at a time to run different tests in parallel. 

I'm very careful about not over committing the storage by using scripts that ensure the disk growth doesn't run out of control. The I/O dip that happens when the thin provisioned disk needs to grow isn't all that noticeable either.

Now I need to support KVM and would very much like to keep Thin/sparse provisioning my disks, however I've read a few things that state mixing a CoW filesystem with another CoW filesystem is a bad idea due to over-redundant writes and disk fragmentation among other things. The common example given was running a VM with a qcow2 formatted disk stored on a BTRFS formatted volume. My situation would be the opposite. I want to have a BTRFS formatted filesystem in a VM running from a qcow2 image. I haven't found much on the particulars of the performance impact of BTRFS snapshots on top of a qcow2 image.

Question: Are there any other sparse file formats that grow with the disk size that KVM likes to use? 

I've explored with using sparse raw files, but I can't seem to get them to stay sparse through a cp, download untar/gunzip, ect. It seems that if you want to use a sparse raw file, you can't move it around at all which would make distribution a pain in the ass.

AlexV (41 rep)

Feb 17, 2017, 06:09 PM • Last activity: Mar 16, 2019, 11:02 PM

0 votes

1 answers

114 views

Report directories with contents that exist elsewhere even if scattered

btrfs zfs storage deduplication copy-on-write

I want to generate a report of directories that I know I can safely delete (even if requiring a quick manual verification), because I know that the full contents all the way down, exist elsewhere--**even if, and especially if, the duplicate files are scattered randomly elsewhere over the volume, pos...

                                  I want to generate a report of directories that I know I can safely delete (even if requiring a quick manual verification), because I know that the full contents all the way down, exist elsewhere--**even if, and especially if, the duplicate files are scattered randomly elsewhere over the volume, possibly in wildly different directory layouts, among files that don’t exist in the directory in question.**

In other words, the directory structure and contents won’t be identical. But 100% of the contained files, individually, will be duplicated...somewhere, anywhere, on the same FS.

Given my workflow and use-case below, it should be clear this will almost always be a one-way relationship. 100% of the file content of dir1 may exist elsewhere, with different file names and directory structures, often more than one copy per file.

For example, copies of dir1/file1 may exist in dir2 and dir3. Copies of dir1/file2 may exist in dir2 and dir4. dir2, dir3, and/or dir4 may also contain their own unique files, as well as copies of files from other directories. But dir1 can most definitely be safely deleted.

In other words, there’s no inverse correlation: dir1 has 100% redundancy scattered about; but dir2, dir3, dir4...etc. won’t necessarily. (They might, and therefore might also be deletion candidates themselves, but the main candidate in question for now is dir1.)

**The rest of this question isn’t strictly necessary to read, in order to understand and answer the question. It just answers some potential tangential “why?” and “have you tried…?” questions.**

Here's the use-case generating the need, which actually seems to be fairly common (or at least not uncommon). ...At least with variations on the end result:

 1. On location:
   1. I take GBs of photos and videos.
   2. Each day, I move the files from memory cards, into folders organized by camera name and date, onto a redundant array of portable USB HDDs.
   3. When I have time, I organize *copies* of those files into a folder structure like “(photos|videos)/year/date”, with filenames pre-pended with “yyyymmdd-hhmmss”. (In other words the original structure gets completely scrambled. And not always in such a predictable way.) Those organized copies go on an SSD drive for faster workflow, but I leave the original, unmanaged copies on the slow redundant storage, for backup, with the copies being physically separated except during the copying step.
 4. Back at home:
   1. I move all the unmanaged files from the USB HDD array, to a "permanent" (larger, more robust, and continuously cloud-backed-up) array, as an original source of truth in case my workflow goes sideways.
   2. Do post-processing on the organized copies on the SSD. (Leaving the original raw files untouched other than being renamed--and saving changes to new files.
3. Once I'm finished and do whatever was intended with the results, I move the entire SSD file structure onto the same larger "permanent" array as the originals. (But remember the directory structure is completely different than the originals SD card-dump structure.)

Ideally in that workflow, I'd also delete the original card-dump folders that is now unnecessary. The problem is, as in life, my workflow is constantly interrupted. Either I don't have time to finish organizing on-location, or it gets put on hold for a while once home, or I don't organize exactly the same way every time, or I just get confused about what exists where and am afraid to delete anything. Often times before heading out, I’ll do a copy of portable media onto permanent array just in case, even if I suspect it may already exist 2 or 3 times already. (I’m not OCD. Just scarred by experience.) Sometimes (less so in later years) I reorganize my entire logical directory structure. Other times I update it midstream going forward, leaving previous ones alone. Over the many years I’ve also moved and completely lost track of where (and how) the “card-dump” files go. Sometimes my on-location workflow, as well-defined and tested as it is, results in uncertain states of various folders, so I make even more backup copies “just in case”. I also used to write programs that would create thousands of folder symlinks in order to view my massive folder structure differently. (Like a filesystem “pivot table”.) But then later rsync'ed the whole filesystem to replacement arrays, while forgetting to set the “preserve hardlinks and symlinks” flags, and wound up with copies that previously were clearly just links, and then over time lost track of which were actually the originals. (Try doing this with 20 years of photos/videos, and 30 years of other data with better results!)

In other words, I have millions of large files all over the place, most of it unnecessarily redundant, in a big beautiful mess. And I need to fix it. Not just to save the space (long since taken care of), but to reduce the confusion of what is safely (and more importantly canonically) where. The first step in this, for me, is to delete the thousands of folders with content I know with high confidence (not necessarily certainty) are 100% distributed elsewhere. Even if each deletion candidate requires quick manual verification.

It’s generating the initial list, that is humanly impossible in one lifetime. Ideally, the list would be “all files in this directory exist elsewhere but in a different directory layout, with those directories also containing non-matching files”. But at minimum, “**all files in this directory also exist elsewhere**”.

I've researched and tested about a dozen solutions for deduplication, some solutions which come tantalizingly close to also solving this problem—but not close enough. My "permanent" array has had inline ZFS deduplication enabled full-time for years. Even though it cuts write throughput to about 25%, I can afford to wait--but I can't afford the many thousands of dollars in extra drive space that would be needed for a couple of decades of twice and sometimes thrice-duplicate photo and video data (not to mention being stored on a stripe of 3-way mirrors).

I've just provisioned a local automatic backup array (to compliment cloud backup). I went with Btrfs RAID1 to avoid potential problems of using the same storage software having the same bugs at the same time. (Which has happened to me before with ZFS, fortunately only resulting in temporary inability to mount.) Also this solution has the beautiful feature of being able to easily scale the array up or out, a single disk at a time. :-), which is good because that is a very expensive and time-consuming proposition on my big primary ZFS array.

Anyway, the only reason that’s relevant to the question, is that Btrfs has a plethora of good utilities for offline deduplication, which as I said, some of which come tantalizingly close to solving this problem, but not enough of the way. A quick summary of what I've tried:

 - **rdfind**: Fast matching algorithm, great for deduplication via hardlinks. The problem is, that could result in disaster for any user (all users?). While partially OK for my distinctly separate requirement of saving space among large redundant media files regardless of name or location, I found that it was disastrous for other things that can't easily be untangled. For example, it also hardlinks other identical files together that have no business being the same file. For example various metadata files that OSes and applications automatically generate, most of which are the same across hundreds or thousands of directories, but absolutely have to be able to be different. E.g. "Thumbs.db", and referencing the same file can and almost certainly will result in losing data later—possibly trivially, possibly not.) It does have an option for deduplicating Btrfs reflinks (which can later differentiate with CoW), but that feature is marked "experimental".
 - **duperemove**: Dedupes with Btrfs reflinks, so that's an acceptable (great, even) approach for saving disk space while allowing files to diverge later. (Except that currently Btrfs apparently un-deduplicates files when defragmenting [depending on the kernel?], even snapshots. What a terrible quirk, but I avoid it by never defragmenting and accepting the consequences.) The problem with duperemove is that, since it blindly checksums every file in the search, it’s incredibly slow and works the disks long and hard. It basically performs a poor-man’s array scrub. Takes several days on my array. (bedup, bees, and some others are similar in that regard, even if very different in other ways. rdfind and some others are smarter. They first compare file sizes. Then first few bytes. Then last few bytes. Only when all those match, does it resort to checksum. )
 - **rmlint**: This currently seems the best fit for my other requirement of just saving disk space. It has two options for Btrfs reflinking (kernel-mode atomic cloning, and the slightly less robust 'cp --reflink' method). The scanning algorithm is the fastest I've tested; hashing can be bumped up to sha256 and higher (including bit-for-bit); and it has many useful options to satisfy many of my requirements. (Except, as best as I can tell, the one in this question.)

There are many other deduping utilities, including fdupes, fslint, etc. I’ve pretty much tested (or read about) them all, even if they don’t have Btrfs support (since that’s mostly irrelevant to this question). None of them, with the possible exception of rmlint, come close to doing what I need.

                                

Jim (469 rep)

Aug 22, 2018, 07:17 PM • Last activity: Feb 9, 2019, 12:43 AM

0 votes

2 answers

5436 views

Is copy-on-write not implemented based on page fault?

linux process virtual-memory fork copy-on-write

Operating System Concepts say > `fork()` we can use a technique known as copy-on-write, which works by allowing the parent and child processes initially to share the same pages. ... > When it is determined that a page is going to be duplicated using > copy- on-write, it is important to note the loca...

                                  Operating System Concepts say

> fork() we can use a technique known as copy-on-write,
which works by allowing the parent and child processes initially to share the
same pages. ...
> When it is determined that a page is going to be duplicated using
> copy- on-write, it is important to note the location from which the
> free page will be allocated. **Many operating systems provide a pool
> of free pages for such requests. These free pages are typically
> allocated when the stack or heap for a process must expand or when
> there are copy-on-write pages to be managed.** Operating systems
> typically allocate these pages using a technique known as
> zero-ﬁll-on-demand. Zero-ﬁll-on-demand pages have been zeroed-out
> before being allocated, thus erasing the previous contents.


Is copy-on-write not implemented based on page fault? (I guess no)


Do copy-on-write and page fault share the same pool of free pages? If not, why? (I guess no)


Is malloc() implemented based on page fault? (I guess yes, but not sure why it shares the same free page pool as copy-on-write, if that pool is not used by page fault)


Thanks.
                                

Tim (106420 rep)

Oct 15, 2018, 04:37 PM • Last activity: Oct 16, 2018, 07:29 AM

1 votes

1 answers

1806 views

fork() and COW behavior after exec()

exec fork copy-on-write

We understand the COW behavior after a fork (as for example described [here][1]) as follows: fork creates a copy of the parent's page table for the child and marks the physical pages read only, so if any of the two processes tries to write it will trigger a page fault and copy the page. What happens...

                                  We understand the COW behavior after a fork (as for example described here ) as follows: fork creates a copy of the parent's page table for the child and marks the physical pages read only, so if any of the two processes tries to write it will trigger a page fault and copy the page.

What happens after the child process execs? We would assume the parent process can again write to its pages without triggering a page fault, but it has proven difficult to find exact information on how this is implemented.

Any pointers (including to code) are welcome!

js84 (113 rep)

Sep 16, 2018, 06:20 AM • Last activity: Sep 18, 2018, 05:20 AM

1 votes

1 answers

787 views

Doesn't the existence of LVM snapshots slows down writing a file system wich doesn't support them natively?

filesystems lvm cache snapshot copy-on-write

As far as understand snapshots in LVM (please, do correct me if I'm wrong): since they are not persistent and work even with a file system which doesn't itself support snapshots => I suppose it must mean that as soon as a snapshot is active, LVM will takes a copy of every block which is written to,...

                                  As far as understand snapshots in LVM (please, do correct me if I'm wrong): since they are not persistent and work even with a file system which doesn't itself support snapshots => I suppose it must mean that as soon as a snapshot is active, LVM will takes a copy of every block which is written to, before it is changed; this copy is saved to a RAM cache and eventually ends-up in another disk-space; and each read from the snapshot will be diverted to this «cache» if it exists there. 

So I understand it means it should slow down every write while a snapshot exists. Does this mean that LVM snapshots should only be taken for as limited as possible durations, just for the time to backup data, and be suppressed as soon as possible ? And is this a concern only if the file system doesn't support snapshot natively ?

Camion (314 rep)

Mar 24, 2018, 08:40 AM • Last activity: Mar 24, 2018, 07:40 PM

13 votes

2 answers

12980 views

btrfs — Is it dangerous to defragment subvolume which has readonly snapshots?

btrfs defragmentation copy-on-write

If you open the `defragment` section of [`btrfs-filesystem(8)`][1], you will see the following ominous inscription left by the developers: > **Warning:** Defragmenting with Linux kernel versions < 3.9 or ≥ 3.14-rc2 as well as with Linux stable kernel versions ≥ 3.10.31, ≥ 3.12.12 or ≥ 3.13.4 will br...

                                  If you open the defragment section of btrfs-filesystem(8) , you will see the following ominous inscription left by the developers:

> **Warning:** Defragmenting with Linux kernel versions < 3.9 or ≥ 3.14-rc2 as well as with Linux stable kernel versions ≥ 3.10.31, ≥ 3.12.12 or ≥ 3.13.4 will break up the ref-links of COW data (for example files copied with cp --reflink, snapshots or de-duplicated data). This may cause considerable increase of space usage depending on the broken up ref-links.

That sounds terrible. A selling point of btrfs is its ability to create snapshots without copying everything. I mostly create readonly snapshots.

Do the files of readonly snapshots also count as “COW-data” or will parent subvolume deduplication survive without making disk space bloat?

firegurafiku (473 rep)

Oct 24, 2017, 09:21 PM • Last activity: Dec 28, 2017, 09:03 PM

Showing page 1 of 20 total questions