paste
is a brilliant tool, but it is dead slow: I get around 50 MB/s on my server when running:
paste -d, file1 file2 ... file10000 | pv >/dev/null
paste
is using 100% CPU according to top
, so it is not limited by, say, a slow disk.
Looking at the source code it is probably because it uses getc
:
while (chr != EOF)
{
sometodo = true;
if (chr == line_delim)
break;
xputchar (chr);
chr = getc (fileptr[i]);
err = errno;
}
Is there another tool that does the same, but which is faster? Maybe by reading 4k-64k blocks at a time? Maybe by using vector instructions for finding the newline in parallel instead of looking at a single byte at a time? Maybe using awk
or similar?
The input files are UTF8 and so big they do not fit in RAM, so reading *everything* into memory is not an option.
Edit:
thanasisp suggests running jobs in parallel. That improves throughput slightly, but it is still a magnitude slower than pure pv
:
# Baseline
$ pv file* | head -c 10G >/dev/null
10.0GiB 0:00:11 [ 897MiB/s] [> ] 3%
# Paste all files at once
$ paste -d, file* | pv | head -c 1G >/dev/null
1.00GiB 0:00:21 [48.5MiB/s] [ ]
# Paste 11% at a time in parallel, and finally paste these
$ paste -d, /dev/null
1.00GiB 0:00:14 [69.2MiB/s] [ ]
top
still shows that it is the outer paste
that is the bottleneck.
I tested if increasing the buffer made a difference:
$ stdbuf -i8191 -o8191 paste -d, /dev/null
1.00GiB 0:00:12 [80.8MiB/s] [ ]
This increased throughput 10%. Increasing the buffer further gave no improvement. This is likely hardware dependent (i.e. it may be due to the size of level 1 CPU cache).
Tests are run in a RAM disk to avoid limitations related to the disk subsystem.
Asked by Ole Tange
(37348 rep)
Nov 23, 2020, 11:46 PM
Last activity: Jan 14, 2025, 02:37 PM
Last activity: Jan 14, 2025, 02:37 PM