I have two tab separated files, each with two columns. I want to create a file which contains overlapping elements by column 1 of the two files. To do so, I put file 1 in an array first then scanned the array to check against file 2 for overlaps. However, somehow the index of the array cannot be recognized. See below for the elaboration of the problem.
The first 3 lines of the files look like this:
File 1:
90001 raw acceleration data
2634 Heavy DIY
1011 Light DIY
File 2:
2634 218263
25680 44313
25681 44313
To show that there are overlaps in column 1 of the two files:
user@cluster:~> grep 90001 file2
90001 103662
user@cluster:~> grep 2634 file2
2634 218263
To create file 3, I tried this first, which yielded an empty file.
awk 'BEGIN {FS = "\t"; OFS= "\t"}
NR==FNR {a[$1]=$2; next}
{ if($1 in a) print $1, a[$1]}' file1 file2 > file3
The following code confirmed the issue is the index of the array was not recognized; because adding the
else
line actually prints file2 into file3.
awk 'BEGIN {FS = "\t"; OFS= "\t"}
NR==FNR {a[$1]=$2; next}
{if($1 in a)
print $1, a[$1]
else
print $1, $2}' file1 file2 > file3
I am quite puzzled. I wonder what might have caused the issue and how I can fix it?
Thanks in advance.
Asked by Xuan
(45 rep)
Mar 29, 2023, 08:54 AM
Last activity: Mar 29, 2023, 11:51 AM
Last activity: Mar 29, 2023, 11:51 AM