mirror a directory tree by hard links for file contents and symlinks for directory structure
1
vote
1
answer
160
views
what is the best way to mirror an entire directory, say
original/
, to a new directory, say mirror/
, which has the structure mirror/data/
and mirror/tree/
, such that
- every file in the directory original/
or in any of its subdirectories is hardlinked to a *file* in mirror/data
- whose filename is a unique identifier of its content, say a hash of its content, and
- which is symlinked to from a point in mirror/tree
whose relative path corresponds to the relative path of the original file in original
,
such that it can be easily restored?
is this feature perhaps implemented by some tool in existence? – one that allows to flexibly choose the command for creating a unique identifier for a file by its content.
---
for instance, say there is only one file original/something
, which is a textfile containing the word “data”. then i want to run a script or command on original
, such that the result is:
$ tree original mirror
original
└── something
mirror
├── data
│ └── 6667b2d1aab6a00caa5aee5af8…
└── tree
└── original
└── something -> ../../data/6667b2d1aab6a00caa5aee5af8…
5 directories, 3 files
here, the file 667b…
is a hard link to original/something
and its filename is sha256sum hash of that file. note that i have abbreviated the filename for legibility.
i want to be able to perfectly restore the original by its mirror.
i know i can write a script to do that, but before i do that and maybe make a mistake and lose some data, i want to know if there is any tool out there that already implements this safely (i didn’t find any so far) or if there are any pitfalls.
*background*: i want to keep an archive of a directory that tracks renames, but i don't need versioning. i know that git-annex
can do that with a lot of overhead using git repositories, but i only need its way to mirror the contents of a directory using symlinks for the directory structure to files whose file names are hashes of their content. then i could use git-diff to track renames. i don't fully understand what git-annex is doing so i don't want to trust it with archiving my data. so i'm looking for a lighter alternative that is less intrusive.
Asked by windfish
(113 rep)
Feb 2, 2024, 12:39 PM
Last activity: Feb 2, 2024, 09:51 PM
Last activity: Feb 2, 2024, 09:51 PM