I am looking for comm's functionality for n, i. e. more than two, files.
man comm
reads:
COMM(1)
NAME
comm - compare two sorted files line by line
SYNOPSIS
comm [OPTION]... FILE1 FILE2
DESCRIPTION
Compare sorted files FILE1 and FILE2 line by line.
With no options, produce three-column output.
Column one contains lines unique to FILE1,
column two contains lines unique to FILE2,
and column three contains lines common to both files.
A first non-optimized and differently formatted approach in bash to illustrate the idea:
user@host MINGW64 dir
$ ls
abc ac ad bca bcd
user@host MINGW64 dir
$ tail -n +1 *
==> abc ac ad bca bcd &2 echo -en "${entry}\t"
7 │ for file in "$@"; do
8 │ foundentry=$(grep "$entry" "$file")
9 │ echo -en "${foundentry}\t"
10 │ done
11 │ echo -en "\n"
12 │ done
───────┴───────────────────────────────────────────────────────────────────────
user@host MINGW64 dir
$ time otherdir/ncomm.sh *
all abc ac ad bca bcd
a a a a a
b b b b
c c c c c
d d d
real 0m12.921s
user 0m0.579s
sys 0m4.586s
user@host MINGW64 dir
$
This displays column headers (to stderr), a first column "all" with all entries found in either file, sorted and then one column per file from the parameter list with their entries in the respective row. As for each cell outside of the first column and first row, grep is invoked once, this is really slow.
As for comm, this output is only suitable for short lines/entries like ids.
A more concise version could output an x or similar for each found entry in columns 2+.
This should work on Git for Windows' MSYS2 and on RHEL.
**How can this be achieved in a more performant manner?**
Asked by Julia
(31 rep)
Jul 29, 2021, 04:17 AM
Last activity: Jan 31, 2022, 02:48 PM
Last activity: Jan 31, 2022, 02:48 PM