Sample Header Ad - 728x90

mirror a directory tree by hard links for file contents and symlinks for directory structure

1 vote
1 answer
160 views
what is the best way to mirror an entire directory, say original/, to a new directory, say mirror/, which has the structure mirror/data/ and mirror/tree/, such that - every file in the directory original/ or in any of its subdirectories is hardlinked to a *file* in mirror/data - whose filename is a unique identifier of its content, say a hash of its content, and - which is symlinked to from a point in mirror/tree whose relative path corresponds to the relative path of the original file in original, such that it can be easily restored? is this feature perhaps implemented by some tool in existence? – one that allows to flexibly choose the command for creating a unique identifier for a file by its content. --- for instance, say there is only one file original/something, which is a textfile containing the word “data”. then i want to run a script or command on original, such that the result is: $ tree original mirror original └── something mirror ├── data │   └── 6667b2d1aab6a00caa5aee5af8… └── tree └── original └── something -> ../../data/6667b2d1aab6a00caa5aee5af8… 5 directories, 3 files here, the file 667b… is a hard link to original/something and its filename is sha256sum hash of that file. note that i have abbreviated the filename for legibility. i want to be able to perfectly restore the original by its mirror. i know i can write a script to do that, but before i do that and maybe make a mistake and lose some data, i want to know if there is any tool out there that already implements this safely (i didn’t find any so far) or if there are any pitfalls. *background*: i want to keep an archive of a directory that tracks renames, but i don't need versioning. i know that git-annex can do that with a lot of overhead using git repositories, but i only need its way to mirror the contents of a directory using symlinks for the directory structure to files whose file names are hashes of their content. then i could use git-diff to track renames. i don't fully understand what git-annex is doing so i don't want to trust it with archiving my data. so i'm looking for a lighter alternative that is less intrusive.
Asked by windfish (113 rep)
Feb 2, 2024, 12:39 PM
Last activity: Feb 2, 2024, 09:51 PM