Sample Header Ad - 728x90

Why is wc with multiple paths much faster than one by one execution?

0 votes
0 answers
75 views
Running wc with multiple files is an order of magnitude faster than running it file by file. For example:
> time git ls-files -z | xargs -0 wc -l > ../bach.x

real    0m2.765s
user    0m0.031s
sys     0m2.531s


> time git ls-files | xargs -I {} wc -l "{}" > ../onebyone.x

real    0m57.832s
user    0m0.156s
sys     0m3.031s
*(The repo contains ~10_000 files so xargs runs wc few times, not just once, but that's not material in this context)* In my naivety I think that wc needs to open and process each file so the speedup must be from multi-threading only. However, I've read that there may be some extra files system magic going on here. Is three file system magic going on here or is it all multi-threading or is it something else? ---- ### Startup penalty Following up on @muru's comment I can see that a) a single execution is ~8ms and b) running wc in a loop scales linearly:
> time wc -l ../x.x > /dev/null

real    0m0.008s
user    0m0.000s
sys     0m0.016s


>  time for run in {1..10}; do wc -l ../x.x; done > /dev/null

real    0m0.076s
user    0m0.000s
sys     0m0.000s

> time for run in {1..100}; do wc -l ../x.x; done > /dev/null

real    0m0.689s
user    0m0.000s
sys     0m0.063s
Since the multi-file run is much faster per file (*10_000f@3_000ms* => *1f@0.3ms*) there seem to be a (huge?) startup penalty for wc which is not related to actually counting the \ns.
Asked by tmaj (101 rep)
Mar 8, 2024, 06:27 AM
Last activity: Mar 8, 2024, 07:05 AM