MacOSX shell directory utilities very slow with large directories (millions of files) - any alternatives?
5
votes
1
answer
124
views
Due to a problem with contact synchronization (not sure what was the source of the problem, probably a program crash on powercut, which caused inconsistency in contact's database file), the synchronization process created nearly 7M files in
Images/
:
hostname:Images username$ pwd
/Users/username/Library/Application Support/AddressBook/Sources/4D81D34B-C932-4578-8A31-4E2E244B3875/Images
hostname:Images username$ ls
^C
hostname:Images username$ ls | wc -l
6797073
(the result was after hours)
hostname:Images username$ cd ..
hostname:4D81D34B-C932-4578-8A31-4E2E244B3875 username$ ls -l
total 600224
-rw-r--r--@ 1 username staff 409600 Aug 2 17:43 AddressBook-v22.abcddb
-rw-r--r--@ 1 username staff 32768 Aug 3 00:13 AddressBook-v22.abcddb-shm
-rw-r--r--@ 1 username staff 2727472 Aug 2 23:26 AddressBook-v22.abcddb-wal
drwx------ 65535 username staff 231100550 Aug 2 23:26 Images
-rw-r--r--@ 1 username staff 45056 Dec 7 2017 MailRecents-v4.abcdmr
-rw-r--r--@ 1 username staff 32768 Dec 7 2017 MailRecents-v4.abcdmr-shm
-rw-r--r--@ 1 username staff 4152 Dec 7 2017 MailRecents-v4.abcdmr-wal
drwx------ 5 username staff 170 Feb 26 18:51 Metadata
-rwxr-xr-x 1 username staff 0 Dec 7 2017 OfflineDeletedItems.plist.lockfile
-rwxr-xr-x 1 username staff 0 Dec 7 2017 Sync.lockfile
-rwxr-xr-x 1 username staff 0 Dec 7 2017 SyncOperations.plist.lockfile
When I tried to use shell tools (ls
, find
), I did not get any result in time reasonable for interactive work (it was hours), regardless of disabling file sorting like ls -f
(what [seems to help in case of other UNIX-like OSs](https://unix.stackexchange.com/questions/120077/the-ls-command-is-not-working-for-a-directory-with-a-huge-number-of-files)) etc.
The ls
process has grown to around 1GB in size and worked for HOURS before outputting any result.
My question is - am I missing some tricky option to have this working reasonably for large directories (outputting the results on the way, eg. to process further, filter etc.) or these tools are just not written to scale? Or maybe there are better file/directory utilities for MacOSX? (I haven't tried any GUI app on that directory, thinking better not to...).
I have written a fairly trivial C program reading the directory entries and outputting the info on the way:
#include
#include
#include
#include
#include
#include
#include
int main( const int argc,
const char * const * argv )
{
const char * const dirpath = argv[ 1 ];
DIR * const dirp = opendir( dirpath );
if ( dirp == NULL )
return -1;
int count = 0;
struct stat statbuf;
for ( struct dirent * entry = readdir( dirp );
entry != NULL;
entry = readdir( dirp ), count++ )
{
char filepath[ PATH_MAX + 1 ];
memset( filepath, 0, PATH_MAX );
strncat( filepath, dirpath, PATH_MAX );
strncat( filepath, "/", PATH_MAX );
strncat( filepath, entry->d_name, PATH_MAX );
stat( filepath, &statbuf );
printf( "%s %llu\n", entry->d_name, statbuf.st_size );
}
closedir( dirp );
printf( "%d", count );
return 0;
}
which actually does work (outputs the result after reading each entry) and has memory footprint of around 300K. So it is not a problem of the OS (filesystem, driver, standard library or whatever), but the tools which basically do not scale well (I know they support more options etc., but for a basic directory listing, without sorting or anything fancy ls
should work better, ie. not allocate 1GB of memory, while find
should do the action for each entry it finds (and matches), not read them all first, as it apparently does...).
Has anyone experienced this and has a good solution how to deal with such huge directories (what utilities to use) on MacOSX? (Or maybe writing some custom system utility is necessary in such case?)
(It is an exceptional situation of course, occurred for the first time on my system - but the OS supports such large directories, and the basic shell tools should deal with them in a _reasonable_ way...)
EDIT:
Small fix in the program (the filepath was lacking '/').
Asked by t-w
(101 rep)
Aug 3, 2021, 12:23 PM
Last activity: Jun 23, 2025, 11:55 AM
Last activity: Jun 23, 2025, 11:55 AM