Sample Header Ad - 728x90

Fast version of paste

4 votes
3 answers
964 views
paste is a brilliant tool, but it is dead slow: I get around 50 MB/s on my server when running: paste -d, file1 file2 ... file10000 | pv >/dev/null paste is using 100% CPU according to top, so it is not limited by, say, a slow disk. Looking at the source code it is probably because it uses getc: while (chr != EOF) { sometodo = true; if (chr == line_delim) break; xputchar (chr); chr = getc (fileptr[i]); err = errno; } Is there another tool that does the same, but which is faster? Maybe by reading 4k-64k blocks at a time? Maybe by using vector instructions for finding the newline in parallel instead of looking at a single byte at a time? Maybe using awk or similar? The input files are UTF8 and so big they do not fit in RAM, so reading *everything* into memory is not an option. Edit: thanasisp suggests running jobs in parallel. That improves throughput slightly, but it is still a magnitude slower than pure pv: # Baseline $ pv file* | head -c 10G >/dev/null 10.0GiB 0:00:11 [ 897MiB/s] [> ] 3% # Paste all files at once $ paste -d, file* | pv | head -c 1G >/dev/null 1.00GiB 0:00:21 [48.5MiB/s] [ ] # Paste 11% at a time in parallel, and finally paste these $ paste -d, /dev/null 1.00GiB 0:00:14 [69.2MiB/s] [ ] top still shows that it is the outer paste that is the bottleneck. I tested if increasing the buffer made a difference: $ stdbuf -i8191 -o8191 paste -d, /dev/null 1.00GiB 0:00:12 [80.8MiB/s] [ ] This increased throughput 10%. Increasing the buffer further gave no improvement. This is likely hardware dependent (i.e. it may be due to the size of level 1 CPU cache). Tests are run in a RAM disk to avoid limitations related to the disk subsystem.
Asked by Ole Tange (37348 rep)
Nov 23, 2020, 11:46 PM
Last activity: Jan 14, 2025, 02:37 PM