Unix & Linux Stack Exchange
Q&A for users of Linux, FreeBSD and other Unix-like operating systems
Latest Questions
2
votes
7
answers
293
views
merge multiple files using full join by the first column
I have a directory with 200 samples. Each file has two columns. I want to full join these files and make a joined file This is the example of ```lang-console $ head 346_T1_A_deduped.bismark.cov_format 1_10525 95.2 1_10526 98.9690721649485 1_10542 80.8 1_10543 87.6288659793814 ``` ```lang-console $ h...
I have a directory with 200 samples. Each file has two columns. I want to full join these files and make a joined file
This is the example of
-console
$ head 346_T1_A_deduped.bismark.cov_format
1_10525 95.2
1_10526 98.9690721649485
1_10542 80.8
1_10543 87.6288659793814
-console
$ head 346_T1_B_deduped.bismark.cov_format
1_10525 95.7142857142857
1_10526 98.1132075471698
1_10542 78.5714285714286
1_10549 75.4716981132076
I want to join all files based on the first column and if any file does not have the correcponding first column put NA in the merge file.
here is the example output
-console
$ head merged.csv
Prob,346_T1_A_deduped.bismark.cov_format,346_T1_B_deduped.bismark.cov_format
1_10525,95.2,95.7142857142857
1_10526,98.9690721649485,98.1132075471698
1_10542,80.8,78.5714285714286
1_10543,87.6288659793814,NA
1_10549,NA,75.4716981132076
So I want comma delimited format output, the first column is the column used for joining and the rest columns are the corresponding file names. As you see NA is placed when prob is not present in any file.
I wrote this
# Create an empty result file
output_file="full_joined_result.txt"
> "$output_file"
# Loop over all files in the directory
for file in *_format; do
# If output file is empty, initialize with the first file's data
if [ ! -s "$output_file" ]; then
cp "$file" "$output_file"
else
# Full join with the next file based on the first column
awk 'FNR==NR {a[$1]=$0; next} {print a[$1], $0}' "$output_file" "$file" > temp && mv temp "$output_file"
fi
done
But does not give the output I want.
I appreciate if anyone could help fixing this.
Anna1364
(1064 rep)
Sep 24, 2024, 04:54 AM
• Last activity: Oct 3, 2024, 12:13 PM
0
votes
1
answers
124
views
How does the auto format of the join command work?
Here is an example: $ cat file1 hello there $ cat file2 some,very,long,line,hello,csv some,very,long,line,nope,csv $ join -t, -1 1 -2 5 file1 file2 hello,some,very,long,line,csv This works as documented: the joined field is pulled out to the left, and the remaining fields are printed. But my desired...
Here is an example:
$ cat file1
hello
there
$ cat file2
some,very,long,line,hello,csv
some,very,long,line,nope,csv
$ join -t, -1 1 -2 5 file1 file2
hello,some,very,long,line,csv
This works as documented: the joined field is pulled out to the left, and the remaining fields are printed.
But my desired output is to retain the CSV (file2) column ordering.
I can get to print the CSV intact with this command,
which lists the CSV field numbers explicitly:
join -t, -1 1 -2 5 -o $(head -n1 file2 | awk -F, '{print NF}' | xargs seq | sed 's/^/2./' | paste -d, -s) file1 file2
And it prints:
some,very,long,line,hello,csv
I _thought_ it's for this that
join
has an auto
format. This is the precious little the man page of join
has to say about the -o FORMAT
:
> If FORMAT is the keyword 'auto', then the first line of each file determines the number of fields output for each line.
However, I don't know what that means
or how I'm supposed to use the format as auto
.
I tried this, but didn't get anything different than I did previously:
$ join -t, -1 1 -2 5 -o auto file1 file2
hello,some,very,long,line,csv
A naive interpretation would be that I should list out the required fields as the first line of each file, but that's wrong, too.
So, my question is, what is the auto
for, and how am I supposed to use it? Can I use it to retain the column order of the input CSV?
Jeenu
(113 rep)
Mar 20, 2024, 01:04 PM
• Last activity: Sep 29, 2024, 02:30 PM
2
votes
1
answers
1127
views
Why do I have "join: extra operand '/dev/fd/62'" error?
I have a script `equijoin2`: #! /bin/bash # default args delim="," # CSV by default outer="" outerfile="" # Parse flagged arguments: while getopts "o:td:" flag do case $flag in d) delim=$OPTARG;; t) delim="\t";; o) outer="-a $OPTARG";; ?) exit;; esac done # Delete the flagged arguments: shift $(($OP...
I have a script
equijoin2
:
#! /bin/bash
# default args
delim="," # CSV by default
outer=""
outerfile=""
# Parse flagged arguments:
while getopts "o:td:" flag
do
case $flag in
d) delim=$OPTARG;;
t) delim="\t";;
o) outer="-a $OPTARG";;
?) exit;;
esac
done
# Delete the flagged arguments:
shift $(($OPTIND -1))
# two input files
f1="$1"
f2="$2"
# cols from the input files
col1="$3"
col2="$4"
join "$outer" -t "$delim" -1 "$col1" -2 "$col2" <(sort "$f1") <(sort "$f2")
and two files
$ cat file1
c c1
b b1
$ cat file2
a a2
c c2
b b2
Why does the last command fail? Thanks.
$ equijoin2 -o 2 -d " " file1 file2 1 1
a a2
b b1 b2
c c1 c2
$ equijoin2 -o 1 -d " " file1 file2 1 1
b b1 b2
c c1 c2
$ equijoin2 -d " " file1 file2 1 1
join: extra operand '/dev/fd/62'
Tim
(106420 rep)
Jul 24, 2018, 05:40 AM
• Last activity: Aug 31, 2024, 05:07 AM
1
votes
2
answers
1709
views
Join command gives incorrect output?
I'm trying to join two simple files in Solaris 5.8 Version as below: ~/temp/s: cat 1 work1 a 8058 51 work2 b 15336 51 ~/temp/s: cat 2 8058 77-11:29:32 /apps/sas 15336 100-12:23:49 /local/hotfix ~/temp/s: join -1 3 -2 1 1 2 8058 work1 a 51 77-11:29:32 /apps/sas (The other line is missing from the out...
I'm trying to join two simple files in Solaris 5.8 Version as below:
~/temp/s: cat 1
work1 a 8058 51
work2 b 15336 51
~/temp/s: cat 2
8058 77-11:29:32 /apps/sas
15336 100-12:23:49 /local/hotfix
~/temp/s: join -1 3 -2 1 1 2
8058 work1 a 51 77-11:29:32 /apps/sas (The other line is missing from the output)
The output only contains one record where it should be two. I'm really not sure where it went wrong.
Is there any way we may get all the records in the output?
Avinash
(413 rep)
Jul 31, 2013, 12:59 PM
• Last activity: Aug 31, 2024, 04:57 AM
2
votes
2
answers
3953
views
Join problem: throwing error, join extra operand
I want to join 3 files on a column which has sorted unique numeric values (those files have only one column of values though) and starts with same prefix for an example "usi". Now, while I am doing this join -j 1 ../Test_Data/usi* > ../Test_Data/join_output.txt I am finding following error: > join:...
I want to join 3 files on a column which has sorted unique numeric values (those files have only one column of values though) and starts with same prefix for an example "usi".
Now, while I am doing this
join -j 1 ../Test_Data/usi* > ../Test_Data/join_output.txt
I am finding following error:
> join: extra operand `usi_rtree_lw_100000.txt'
> Try `join --help' for more information.
Any ideas?
N. F.
(2269 rep)
Oct 22, 2012, 03:45 AM
• Last activity: Aug 31, 2024, 04:55 AM
4
votes
3
answers
5478
views
How to sort and join at the same time?
I have 2 files, one having 2 columns, another having 1 column. The second file is sorted using `sort -u`. Now the task is I need to join this column with the first column of the first file, which is not sorted. So what will be the syntax? Will `join -j 1 file2.txt sort -s -n -k 1 file1.txt` work? Th...
I have 2 files, one having 2 columns, another having 1 column.
The second file is sorted using
sort -u
.
Now the task is I need to join this column with the first column of the first file, which is not sorted.
So what will be the syntax? Will join -j 1 file2.txt sort -s -n -k 1 file1.txt
work?
The output I want is actually the 2nd column of file 2 after joining and the unique entries in it.
File 2
------
1
2
3
File 1
------
2 500
1 5000
1 300
3 3000
3 300
4 450
Output
------
5000
300
500
3000
N. F.
(2269 rep)
Sep 28, 2012, 08:19 AM
• Last activity: Aug 31, 2024, 04:45 AM
3
votes
1
answers
647
views
The "join" utility reports: file is not sorted, but in fact it is sorted
I have two files `t1` and `t2`. root@localhost:~# root@localhost:~# cat t1 udp UNCONN 0 0 0.0.0.0:68 0.0.0.0:* users:(("dhclient",pid=479,fd=7)) 479 tcp LISTEN 0 128 127.0.0.1:6060 0.0.0.0:* users:(("gggg-ruit",pid=24968,fd=5)) 24968 root@localhost:~# root@localhost:~# cat t2 root 88 0.0 0.0 0 0 ? I...
I have two files
t1
and t2
.
root@localhost:~#
root@localhost:~# cat t1
udp UNCONN 0 0 0.0.0.0:68 0.0.0.0:* users:(("dhclient",pid=479,fd=7)) 479
tcp LISTEN 0 128 127.0.0.1:6060 0.0.0.0:* users:(("gggg-ruit",pid=24968,fd=5)) 24968
root@localhost:~#
root@localhost:~# cat t2
root 88 0.0 0.0 0 0 ? I< Jan06 0:00 [scsi_tmf_0]
root 96 0.0 0.0 0 0 ? I< Jan06 0:00 [ipv6_addrconf]
root 24965 0.0 0.2 11592 3004 ? S Jan12 0:00 bash /root/restart_gggg.sh
root 24968 0.7 5.2 112488 53472 ? Sl Jan12 30:52 /usr/local/bin/gggg-ruit -singleInstance :44444
I want to join them on the 8th column of t1 and the 2nd column of t2. I already have them in sorted order. Let's prove it.
root@localhost:~# awk '{print $8}' t1
479
24968
root@localhost:~# awk '{print $2}' t2
88
96
24965
24968
Now when I join
them, I got the following error.
root@localhost:~# join -1 8 -2 2 -o 2.1,2.2,1.1,1.2,1.5,1.6,2.11 t1 t2
join: t2:3: is not sorted: root 24965 0.0 0.2 11592 3004 ? S Jan12 0:00 bash /root/restart_gggg.sh
root@localhost:~#
Why it tells me t2 is not sorted on row 3? As you can see, it's been already sorted on the join column.
Just a learner
(2022 rep)
Jan 15, 2020, 01:54 PM
• Last activity: Aug 31, 2024, 04:35 AM
0
votes
2
answers
201
views
Sort is not sorting?
I seem to be having the same issue as described in https://unix.stackexchange.com/questions/562256/the-join-utility-reports-file-is-not-sorted-but-in-fact-it-is-sorted however I have piped BOTH files through `sort` before attempting to join. I have also tried `sort -d` and `sort -g`. This is running...
I seem to be having the same issue as described in https://unix.stackexchange.com/questions/562256/the-join-utility-reports-file-is-not-sorted-but-in-fact-it-is-sorted however I have piped BOTH files through
sort
before attempting to join. I have also tried sort -d
and sort -g
.
This is running on Amazon Linux 2, using sort from coreutils-8.22-24
The following illustrates the issue:
root@host:/home/user# cat /tmp/db_schema_size | sort
directory 0.000106811523
directory_1 1.059814453265
directory_123 0.564987182688
directory_123123 0.564987182688
directory_1234 0.564987182688
directory_12345 0.564987182688
directory_1234567 0.564987182688
directory_82473 0.934677124123
directory_82475 0.751586914161
directory_82477 0.881881713968
directory_82479 0.751571655373
directory_82481 0.750396728614
directory_82483 0.589370727610
root@host:/home/user# cat /tmp/db_dir_sizes | sort
directory 132
directory_1 1115936
directory_123123 613244
directory_12345 613248
directory_1234567 613248
directory_1234 613244
directory_123 613244
directory_82473 1015140
directory_82475 818764
directory_82477 958628
directory_82479 818756
directory_82481 817500
directory_82483 638820
Both files are the same structure - no lead/trailing whitespace, a single tab char between the values.
Both files are processed by sort but produce output in a different order.
I do see that on Ubuntu 22.04 LTS, the output is consistent (and of the first form above).
What am I missing here?
**update**
For clarification...
On AWS Linux 2 with LANG=en_US.UTF-8, I get output as above - i.e. output differs
On AWS Linux 2 with LANG=C.UTF-8 output is the same
On Ubuntu with both LANG=C.UTF-8 and LANG=en_US.UTF-8, output is the same
symcbean
(6301 rep)
Jun 19, 2024, 01:37 PM
• Last activity: Aug 31, 2024, 04:32 AM
4
votes
3
answers
21776
views
Merging 2 files with based on field match
I want to create a file that contains columns from two input files. File1 is like: aa 32 bb 15 cc 78 File2 is: fa 19 bc 23 cc 50 de 28 aa 45 bb 31 The task is, read through File1, if the 1st field of a row exists among the 1st field of File2, then print that line of File2, with both columns and add...
I want to create a file that contains columns from two input files. File1 is like:
aa 32
bb 15
cc 78
File2 is:
fa 19
bc 23
cc 50
de 28
aa 45
bb 31
The task is, read through File1, if the 1st field of a row exists among the 1st field of File2, then print that line of File2, with both columns and add the 2nd column entry of File1 containing the 1st field.
The output should be like:
aa 45 32
bb 31 15
cc 50 78
awk is preferred for the script.
hldn
(41 rep)
Oct 8, 2014, 04:16 AM
• Last activity: Mar 22, 2024, 10:06 AM
1
votes
0
answers
35
views
Problems with join on sorted key column but linux says "not in expected order"
File 1. Column1=old_id; Column2=snp_info (50 000 integers, so too large to show, containts 0,1,2,5 -> i.e., 020112202010511) ``` 80024979 80024987 80025141 80107980 80922131 81666414 81667586 87021127 87028460 2010112924 2010115513 2010186050 ``` File 2. Column1=old_id; Column2=new_id. ``` 79931168...
File 1. Column1=old_id; Column2=snp_info (50 000 integers, so too large to show, containts 0,1,2,5 -> i.e., 020112202010511)
File 2, column1=old_id, column2=new_id
Result1 where it continuous:
Result2 where it stops.
Any help will be immensely appreciated. Kind regards, Michiel.
80024979
80024987
80025141
80107980
80922131
81666414
81667586
87021127
87028460
2010112924
2010115513
2010186050
File 2. Column1=old_id; Column2=new_id.
79931168 58155
79944336 58190
79969242 72833
80107980 58150
80922131 58109
2010112924 96821
2010115513 80604
2010186050 47254
2010198857 90190
2010229173 96927
2010229330 67548
I am trying to join on column 1 which was sorted using sort -k 1n file1 and the same for file2. When joining (in linux) I get the following error:
join: file 2 is not in expected order
join: file 1 is not in expected order.
If I look at the join, this is where it gives the error:
...200000010000000000000000 58150
join: file 2 is not in expected order -> but continues with the next old_id (file1) and their column2 value ->
80922131 1100000020000...
...001000010 58109 -> and then it stops here, that is again with column2 of file1 and column 2 of file 2 (the new_id column I want to put in).
I looked at the values, it start giving errors and stop where in file2 column 1 the old_id goes from a value with 8 integers to values of 10 integers. Notice that it is indeed sorted correctly.
How do I solve this? What is the cause? Pictures included. Thank you in advance.
File 1, column1, (column2 not shown, but = 0101020222015... -> 50 000 integers per id)




Michiel Van Niekerk
(41 rep)
Jan 7, 2024, 04:29 PM
• Last activity: Jan 7, 2024, 04:39 PM
0
votes
0
answers
34
views
I want to join variables (columns) from two different files with a matching column using join - problems with output
Good day. I am new to linux and really hope someone can help me. I have two files: File 1 : Column1=id_nr; Column2=snp_info [![Column1=id_nr; Column2=snp_info][1]][1] File 2 : Column1=id_nr; Column2=recoded_id_nr [![Column1=id_nr; Column2=recoded_id_nr][2]][2] I want to join the two files so that I...
Good day. I am new to linux and really hope someone can help me.
I have two files:
File 1 : Column1=id_nr; Column2=snp_info
File 2 : Column1=id_nr; Column2=recoded_id_nr
I want to join the two files so that I end up with a file with Column1=recoded_id_nr and Column2=snp_info for animals (id_nr) present in File 1 and File 2. The id_nr are the matching variable and both files are sorted on it.
In linux I used the various forms of the join function, but get the same as the common one below:
join file1 file2
Result:
It gives the correct animals that are present in both files, but the problem is with the first column. The first integer is the correct recoded_id_nr, but it is fused (concatenated), without any space, with the last 8 digits of the id_nr of the 3 animals that are present in both files. Thus, it cuts of the first 2 digits of the id_nr and replace it with the recoded_id_nr. The second column (snp_info) is fine.
Any help would be immensely appreciated.
Michiel



Michiel Van Niekerk
(41 rep)
Jan 6, 2024, 04:46 PM
-1
votes
1
answers
64
views
Returning lines from a JSON log file which contain IP and timestamps read from a CSV file?
so I'm new to Shell Scripting and could use some help. So in my CSV file, I want to see if IP and timestamp values exists in some way in the line entries of the JSON log file, and if so, return that specific JSON log entry to another file. I tried to make it universal so it's applicable to all IP ad...
so I'm new to Shell Scripting and could use some help. So in my CSV file, I want to see if IP and timestamp values exists in some way in the line entries of the JSON log file, and if so, return that specific JSON log entry to another file. I tried to make it universal so it's applicable to all IP addresses. Here's what the sample CSV file would look like;
"clientip,""destip","dest_hostname","timestamp"
"127.0.0.1","0.0.0.0","randomhost","2023-09-09T04:18:22.542Z"
A sample line entry from the Json Log File
{"log": "09-Sept-2023 rate-limit: info: client @xyz 127.0.0.1, "stream":"stderr", "time": 2023-09-09T04:18:22.542Z"}
It's the lines from the JSON log file we want to return in the output.txt file when there's a match. The JSON file doesn't have the same fields and organization like the CSV does (with clientip, destip, dest_hostname, timestamp, but I was hoping that I could still at least return lines from the JSON log files to a new file that had matches on the clientip (like we see here with 127.0.0.1 in "info: client @xyz 127.0.0.1) and maybe the timestamp.
I've tried
join file.csv xyz-json.log > output.txt
I've also tried the awk
command to no avail. I would really appreciate your help with this!
Linuxnoob
(1 rep)
Nov 30, 2023, 02:11 AM
• Last activity: Nov 30, 2023, 11:13 AM
9
votes
4
answers
15183
views
How to join a line with a pattern with the next line with sed?
I can't find this case in the board, so I'm asking the question. This is input file: module x(a,b,c) module y(d,e,f, g,h,i) module z(j,k,l) And output file should be: module x(a,b,c) module y(d,e,f, g,h,i) module z(j,k,l)
I can't find this case in the board, so I'm asking the question.
This is input file:
module
x(a,b,c)
module
y(d,e,f,
g,h,i)
module
z(j,k,l)
And output file should be:
module x(a,b,c)
module y(d,e,f,
g,h,i)
module z(j,k,l)
funfun
(119 rep)
Dec 17, 2018, 06:24 AM
• Last activity: Aug 19, 2023, 10:03 PM
0
votes
4
answers
1935
views
Comparing two .csv files and write into 3rd file
On my Mac I have two files: `file1.csv` (about 4000 rows) 1234,yxds 4352,fsfk 8765,fdgj 7874,hffe 9038,fjro 8297,fhjs and `file2.csv` (about 50 rows) 1234,"number","date","OS" 4352,"number","date","OS" 8765,"number","date","OS" 8297,"number","date","OS" file 3 should look like this: 1234,yxds 4352,f...
On my Mac I have two files:
file1.csv
(about 4000 rows)
1234,yxds
4352,fsfk
8765,fdgj
7874,hffe
9038,fjro
8297,fhjs
and file2.csv
(about 50 rows)
1234,"number","date","OS"
4352,"number","date","OS"
8765,"number","date","OS"
8297,"number","date","OS"
file 3 should look like this:
1234,yxds
4352,fsfk
8765,fdgj
8297,fhjs
So I only want the entries that exist in file2.csv
but with column 2 of file1.csv
Essentially I want to write a script compares file1.csv
to a different file3.csv
So far I tried join
and awk
but with no success. This is what i tried so far :
join -t, -12 -o 1.1,2.1,2.2 file1.csv file2.csv > file 3.csv
and
awk 'NR==FNR{a[$1]=$2;next}{if ($1 in a && a[$1] != $2)print;}' FS="|" file1.csv file2.csv > file3.csv
Sotey
(1 rep)
Jun 11, 2019, 09:59 AM
• Last activity: Jul 21, 2023, 07:02 AM
2
votes
3
answers
1681
views
How to do anti-join or inverse join in bash
I want to perform what some data analysis software call an anti-join: remove from one list those lines matching lines in another list. Here is some toy data and the expected output: ``` $ echo -e "a\nb\nc\nd" > list1 $ echo -e "c\nd\ne\nf" > list2 $ antijoincommand list1 list2 a b ```
I want to perform what some data analysis software call an anti-join: remove from one list those lines matching lines in another list. Here is some toy data and the expected output:
$ echo -e "a\nb\nc\nd" > list1
$ echo -e "c\nd\ne\nf" > list2
$ antijoincommand list1 list2
a
b
Josh
(313 rep)
May 24, 2020, 01:28 PM
• Last activity: Jul 12, 2023, 11:06 PM
-1
votes
1
answers
79
views
Why no one thought of the concept of joins before the 70s?
So, let me see if I have my history right: - In 1976, the first system sold as an RDBMS was Multics Relational Data Store. I don't know if it had JOIN, but let's assume it did. - The UNIX command `join` was released in 1979. It imitates SQL's JOIN but it works on plain text files. So my question is....
So, let me see if I have my history right:
- In 1976, the first system sold as an RDBMS was Multics Relational Data Store. I don't know if it had JOIN, but let's assume it did.
- The UNIX command
join
was released in 1979. It imitates SQL's JOIN but it works on plain text files.
So my question is. If join
is relatively useful on plain text, why wasn't something like it developed before? Surely operating systems were dealing with plain text data and "tabular data" to some extent since the late 50s. And I'm sure at some point they would have found it useful to have a ready-made abstraction or tool to combine data from two different files in a meaningful way.
So, why didn't the concept emerge before? Are my rose-tinted glasses underestimating the cognitive overhead of joins, with all their Venn diagramness, that is only justifiable when the performance of a RDB is needed? Were people content with merging data in an ad-hoc way? Or was join indeed a stroke of genius?
Sebastian Carlos
(262 rep)
Jun 26, 2023, 04:09 PM
• Last activity: Jun 26, 2023, 04:59 PM
1
votes
2
answers
3316
views
Join two csv files by matching columns, join command
I have two .csv files that I need to match based on column 1. The two file structures look like this. FILE1 gopAga1_00004004-RA,1122.825534, -2.497919969, 0.411529843 gopAga1_00010932-RA,440.485381, 1.769511316, 0.312853434 gopAga1_00007012-RA, 13.37565185, -1.973108929, 0.380227982 etc... FILE2 gop...
I have two .csv files that I need to match based on column 1.
The two file structures look like this.
FILE1
gopAga1_00004004-RA,1122.825534, -2.497919969, 0.411529843
gopAga1_00010932-RA,440.485381, 1.769511316, 0.312853434
gopAga1_00007012-RA, 13.37565185, -1.973108929, 0.380227982
etc...
FILE2
gopAga1_00004004-RA, ENSACAP00000013845
gopAga1_00009937-RA, ENSACAP00000000905
gopAga1_00010932-RA, ENSACAP00000003279
gopAga1_00000875-RA, ENSACAP00000000296
gopAga1_00010837-RA, ENSACAP00000011919
gopAga1_00007012-RA, ENSACAP00000012682
gopAga1_00017831-RA, ENSACAP00000016147
gopAga1_00005588-RA, ENSACAP00000011117
etc..
This is my current command that I am running using join:
This is formatted from what I have also read on the following threads here
join -1 1 -2 1 -t , -a 1 -e "NA" -o "2.2,1.1,1.2,1.3" output.txt
However, every time I run this prompt it only writes the first row to output.
Anyone know why my code is running like this and not actually merging the two files based on the GOP ID?
cdxun
(11 rep)
Jun 26, 2018, 07:54 PM
• Last activity: May 15, 2023, 03:09 AM
0
votes
1
answers
281
views
Merging two Unix files
I have 2 pipe delimited files say file1 and file2. Where file1 has say 33 columns and file2 may have 34/35/36 columns. So let's not consider how many columns we have. What I want to do is compare the values in file1 & file2 (from column 1 till column 32). If all values are same then take the values...
I have 2 pipe delimited files say file1 and file2.
Where file1 has say 33 columns and file2 may have 34/35/36 columns. So let's not consider how many columns we have.
What I want to do is compare the values in file1 & file2 (from column 1 till column 32). If all values are same then take the values from file2 and append for all same records in file1.
Say for 1st records in file2 have 5 matches in file1 the take value "|84569|21.5|1" and append it to all matches on file1 (see file3 for expected results).
Similarly, for 2nd record in file2 we have 5 matches in file1, so take the value "|0" and append it to all matched records in file1.
Same goes with 3rd record from file2. There are 3 matches so take value "|21457879|12.4" and append it on all 3 matched rows in file1
If you are thinking how are we selecting from where to take values from file2 for appending in file1 then we should take it from column no 34. Though the start position is fixed but end position is not.
Like if you in example "a" the we have taken values from col 34/35/36 but for "b" we have just col 34. however for "c" we have values in col 34/35.
I do not know how to format the data in my examples below. So giving it as it is.
file1
a|a1|a2|a3|a4|...|a32|acb@sma.com
a|a1|a2|a3|a4|...|a32|acd@sm.com$1553:2015-02-14
a|a1|a2|a3|a4|...|a32|axwer@xi.com30:2015-03-01
a|a1|a2|a3|a4|...|a32|acbw@ma.com$121:2015-01-31
a|a1|a2|a3|a4|...|a32|art@ma.com$293:2015-02-28
b|b1|b2|b3|b4|...|b32|asmi@g.in$542:2013:05:24
b|b1|b2|b3|b4|...|b32|kasmi@g.in$542:2013:05:24
b|b1|b2|b3|b4|...|b32|asmi@g.in14:2013:05:24
b|b1|b2|b3|b4|...|b32|asmi@g.in$542:2013:05:24
b|b1|b2|b3|b4|...|b32|asmi@g.in232:2014:05:24
c|c1|c2|c3|c4|...|c32|Asce@ita.in
c|c1|c2|c3|c4|...|c32|$200:2011:12:06
c|c1|c2|c3|c4|...|c32|kst@gre.in$214:2001:01:31
file2
a|a1|a2|a3|a4|...|a32|acb@sma.com|84569|21.5|1
b|b1|b2|b3|b4|...|b32|asmi@g.in$542:2013:05:24|0
c|c1|c2|c3|c4|...|c32|Asce@ita.in|21457879|12.4
Expected File: File3
a|a1|a2|a3|a4|...|a32|acb@sma.com|84569|21.5|1
a|a1|a2|a3|a4|...|a32|acd@sm.com$1553:2015-02-14|84569|21.5|1
a|a1|a2|a3|a4|...|a32|axwer@xi.com30:2015-03-01|84569|21.5|1
a|a1|a2|a3|a4|...|a32|acbw@ma.com$121:2015-01-31|84569|21.5|1
a|a1|a2|a3|a4|...|a32|art@ma.com$293:2015-02-28|84569|21.5|1
b|b1|b2|b3|b4|...|b32|asmi@g.in$542:2013:05:24|0
b|b1|b2|b3|b4|...|b32|kasmi@g.in$542:2013:05:24|0
b|b1|b2|b3|b4|...|b32|asmi@g.in14:2013:05:24|0
b|b1|b2|b3|b4|...|b32|asmi@g.in$542:2013:05:24|0
b|b1|b2|b3|b4|...|b32|asmi@g.in232:2014:05:24|0
c|c1|c2|c3|c4|...|c32|Asce@ita.in|21457879|12.4
c|c1|c2|c3|c4|...|c32|$200:2011:12:06|21457879|12.4
c|c1|c2|c3|c4|...|c32|kst@gre.in$214:2001:01:31|21457879|12.4
mdx
(11 rep)
Apr 24, 2015, 10:44 AM
• Last activity: Jan 5, 2023, 08:19 AM
0
votes
1
answers
937
views
Sort and Join with letters and numbers
I have two files (file1 and file2) and I would like to mount a third (file3) containing a column that the two do not have. I usually do the sort and then together with the join. But it is not working, because it has values with letters and numbers from the example below: File1: ARS-BFGL-BAC-1180 14...
I have two files (file1 and file2) and I would like to mount a third (file3) containing a column that the two do not have. I usually do the sort and then together with the join.
But it is not working, because it has values with letters and numbers from the example below:
File1:
ARS-BFGL-BAC-1180 14 20574088
ARS-BFGL-BAC-11805 12 14897445
ARS-BFGL-BAC-11852 1 34309921
ARS-BFGL-BAC-11867 13 12616676
File2
AC16359 ARS-BFGL-BAC-11798 B B
AC16359 ARS-BFGL-BAC-11805 B B
AC16359 ARS-BFGL-BAC-1180 B B
AC16359 ARS-BFGL-BAC-11852 A A
AC16359 ARS-BFGL-BAC-11867 B B
I notice that I use the same sort command for both and note that the final SNP_Name 11805 is above 1180 and the other is not.
Augusto II
(1 rep)
Apr 29, 2021, 01:56 PM
• Last activity: Jan 5, 2023, 08:00 AM
0
votes
2
answers
1879
views
Merge all csv files in directory column wise
Say the directory contained 3 csv files: The first csv: ``` Name, John Age, 18 ``` The second csv: ``` Name, Jim Age, 21 ``` The third csv: ``` Name, Amy Age, 22 ``` I would want the result to be: ``` Name, John, Jim, Amy Age, 18, 21, 22 ``` It's important to know the directory could have n many csv...
Say the directory contained 3 csv files:
The first csv:
Name, John
Age, 18
The second csv:
Name, Jim
Age, 21
The third csv:
Name, Amy
Age, 22
I would want the result to be:
Name, John, Jim, Amy
Age, 18, 21, 22
It's important to know the directory could have n many csvs
I have both bash and posix shell available
Edit:
This feels like it should work but still has an issue with regards to order:
awk -F, -v OFS="," '{a[FNR]=a[FNR]?a[FNR]FS$2:$1FS$2}END{for(x in a)print x,a[x]}' *.csv > results.csv
Which makes no sense as FNR 1 should be first in the array but it is printed last?
Tap
(35 rep)
Apr 20, 2022, 05:22 PM
• Last activity: Nov 23, 2022, 10:28 AM
Showing page 1 of 20 total questions