Sample Header Ad - 728x90

MacOSX shell directory utilities very slow with large directories (millions of files) - any alternatives?

5 votes
1 answer
124 views
Due to a problem with contact synchronization (not sure what was the source of the problem, probably a program crash on powercut, which caused inconsistency in contact's database file), the synchronization process created nearly 7M files in Images/:
hostname:Images username$ pwd
/Users/username/Library/Application Support/AddressBook/Sources/4D81D34B-C932-4578-8A31-4E2E244B3875/Images

hostname:Images username$ ls
^C

hostname:Images username$ ls | wc -l
 6797073
(the result was after hours)

hostname:Images username$ cd ..
hostname:4D81D34B-C932-4578-8A31-4E2E244B3875 username$ ls -l
total 600224
-rw-r--r--@     1 username  staff     409600 Aug  2 17:43 AddressBook-v22.abcddb
-rw-r--r--@     1 username  staff      32768 Aug  3 00:13 AddressBook-v22.abcddb-shm
-rw-r--r--@     1 username  staff    2727472 Aug  2 23:26 AddressBook-v22.abcddb-wal
drwx------  65535 username  staff  231100550 Aug  2 23:26 Images
-rw-r--r--@     1 username  staff      45056 Dec  7  2017 MailRecents-v4.abcdmr
-rw-r--r--@     1 username  staff      32768 Dec  7  2017 MailRecents-v4.abcdmr-shm
-rw-r--r--@     1 username  staff       4152 Dec  7  2017 MailRecents-v4.abcdmr-wal
drwx------      5 username  staff        170 Feb 26 18:51 Metadata
-rwxr-xr-x      1 username  staff          0 Dec  7  2017 OfflineDeletedItems.plist.lockfile
-rwxr-xr-x      1 username  staff          0 Dec  7  2017 Sync.lockfile
-rwxr-xr-x      1 username  staff          0 Dec  7  2017 SyncOperations.plist.lockfile
When I tried to use shell tools (ls, find), I did not get any result in time reasonable for interactive work (it was hours), regardless of disabling file sorting like ls -f (what [seems to help in case of other UNIX-like OSs](https://unix.stackexchange.com/questions/120077/the-ls-command-is-not-working-for-a-directory-with-a-huge-number-of-files)) etc. The ls process has grown to around 1GB in size and worked for HOURS before outputting any result. My question is - am I missing some tricky option to have this working reasonably for large directories (outputting the results on the way, eg. to process further, filter etc.) or these tools are just not written to scale? Or maybe there are better file/directory utilities for MacOSX? (I haven't tried any GUI app on that directory, thinking better not to...). I have written a fairly trivial C program reading the directory entries and outputting the info on the way:
#include 
#include 
#include 
#include 
#include 
#include 
#include 

int main( const int argc,
	      const char * const * argv )
{
    const char * const dirpath = argv[ 1 ];
    DIR * const dirp = opendir( dirpath );
    if ( dirp == NULL )
	    return -1;

    int count = 0;
    struct stat statbuf;

    for ( struct dirent * entry = readdir( dirp );
	      entry != NULL;
	      entry = readdir( dirp ), count++ )
    {
       char filepath[ PATH_MAX + 1 ];
       memset( filepath, 0, PATH_MAX );
       strncat( filepath, dirpath, PATH_MAX );
       strncat( filepath, "/", PATH_MAX );
       strncat( filepath, entry->d_name, PATH_MAX );
       stat( filepath, &statbuf );
       printf( "%s %llu\n", entry->d_name, statbuf.st_size );
    }
    closedir( dirp );
    printf( "%d", count );

    return 0;
}
which actually does work (outputs the result after reading each entry) and has memory footprint of around 300K. So it is not a problem of the OS (filesystem, driver, standard library or whatever), but the tools which basically do not scale well (I know they support more options etc., but for a basic directory listing, without sorting or anything fancy ls should work better, ie. not allocate 1GB of memory, while find should do the action for each entry it finds (and matches), not read them all first, as it apparently does...). Has anyone experienced this and has a good solution how to deal with such huge directories (what utilities to use) on MacOSX? (Or maybe writing some custom system utility is necessary in such case?) (It is an exceptional situation of course, occurred for the first time on my system - but the OS supports such large directories, and the basic shell tools should deal with them in a _reasonable_ way...) EDIT: Small fix in the program (the filepath was lacking '/').
Asked by t-w (101 rep)
Aug 3, 2021, 12:23 PM
Last activity: Jun 23, 2025, 11:55 AM