Unix & Linux Stack Exchange

Q&A for users of Linux, FreeBSD and other Unix-like operating systems

Latest Questions

2 votes

7 answers

293 views

merge multiple files using full join by the first column

I have a directory with 200 samples. Each file has two columns. I want to full join these files and make a joined file This is the example of ```lang-console $ head 346_T1_A_deduped.bismark.cov_format 1_10525 95.2 1_10526 98.9690721649485 1_10542 80.8 1_10543 87.6288659793814 ``` ```lang-console $ h...

I have a directory with 200 samples. Each file has two columns. I want to full join these files and make a joined file This is the example of

-console
$ head 346_T1_A_deduped.bismark.cov_format
1_10525 95.2
1_10526 98.9690721649485
1_10542 80.8
1_10543 87.6288659793814

-console
$ head 346_T1_B_deduped.bismark.cov_format
1_10525 95.7142857142857
1_10526 98.1132075471698
1_10542 78.5714285714286
1_10549 75.4716981132076

I want to join all files based on the first column and if any file does not have the correcponding first column put NA in the merge file. here is the example output

-console
$ head merged.csv
Prob,346_T1_A_deduped.bismark.cov_format,346_T1_B_deduped.bismark.cov_format
1_10525,95.2,95.7142857142857
1_10526,98.9690721649485,98.1132075471698
1_10542,80.8,78.5714285714286
1_10543,87.6288659793814,NA
1_10549,NA,75.4716981132076

So I want comma delimited format output, the first column is the column used for joining and the rest columns are the corresponding file names. As you see NA is placed when prob is not present in any file. I wrote this

# Create an empty result file
output_file="full_joined_result.txt"
> "$output_file"

# Loop over all files in the directory
for file in *_format; do
  # If output file is empty, initialize with the first file's data
  if [ ! -s "$output_file" ]; then
    cp "$file" "$output_file"
  else
    # Full join with the next file based on the first column
    awk 'FNR==NR {a[$1]=$0; next} {print a[$1], $0}' "$output_file" "$file" > temp && mv temp "$output_file"
  fi
done

But does not give the output I want. I appreciate if anyone could help fixing this.

Anna1364 (1064 rep)

Sep 24, 2024, 04:54 AM • Last activity: Oct 3, 2024, 12:13 PM

0 votes

1 answers

124 views

How does the auto format of the join command work?

join documentation

Here is an example: $ cat file1 hello there $ cat file2 some,very,long,line,hello,csv some,very,long,line,nope,csv $ join -t, -1 1 -2 5 file1 file2 hello,some,very,long,line,csv This works as documented: the joined field is pulled out to the left, and the remaining fields are printed. But my desired...

                                  Here is an example:

    $ cat file1
    hello
    there
    $ cat file2
    some,very,long,line,hello,csv
    some,very,long,line,nope,csv
    $ join -t, -1 1 -2 5 file1 file2
    hello,some,very,long,line,csv

This works as documented: the joined field is pulled out to the left, and the remaining fields are printed.

But my desired output is to retain the CSV (file2) column ordering. 
I can get to print the CSV intact with this command,
which lists the CSV field numbers explicitly:

    join -t, -1 1 -2 5 -o $(head -n1 file2 | awk -F, '{print NF}' | xargs seq | sed 's/^/2./' | paste -d, -s) file1 file2

And it prints:

    some,very,long,line,hello,csv

I _thought_ it's for this that join has an auto format. This is the precious little the man page of join has to say about the -o FORMAT:

> If FORMAT is the keyword 'auto', then the first line of each file determines the number of fields output for each line.

However, I don't know what that means
or how I'm supposed to use the format as auto.

I tried this, but didn't get anything different than I did previously:

    $ join -t, -1 1 -2 5 -o auto file1 file2
    hello,some,very,long,line,csv

A naive interpretation would be that I should list out the required fields as the first line of each file, but that's wrong, too.

So, my question is, what is the auto for, and how am I supposed to use it? Can I use it to retain the column order of the input CSV?
                                

Jeenu (113 rep)

Mar 20, 2024, 01:04 PM • Last activity: Sep 29, 2024, 02:30 PM

2 votes

1 answers

1127 views

Why do I have "join: extra operand '/dev/fd/62'" error?

text-processing join coreutils

I have a script `equijoin2`: #! /bin/bash # default args delim="," # CSV by default outer="" outerfile="" # Parse flagged arguments: while getopts "o:td:" flag do case $flag in d) delim=$OPTARG;; t) delim="\t";; o) outer="-a $OPTARG";; ?) exit;; esac done # Delete the flagged arguments: shift $(($OP...

                                  I have a script equijoin2:

    #! /bin/bash

    # default args                                                                                                                                                                    
    delim="," # CSV by default                                                                                                                                                        
    outer=""
    outerfile=""
    # Parse flagged arguments:                                                                                                                                                        
    while getopts "o:td:" flag
    do
      case $flag in
        d) delim=$OPTARG;;
        t) delim="\t";;
        o) outer="-a $OPTARG";;
        ?) exit;;
      esac
    done
    # Delete the flagged arguments:                                                                                                                                                   
    shift $(($OPTIND -1))
    # two input files                                                                                                                                                                 
    f1="$1"
    f2="$2"
    # cols from the input files                                                                                                                                                       
    col1="$3"
    col2="$4"
    
    
    join "$outer" -t "$delim" -1 "$col1" -2 "$col2" <(sort "$f1") <(sort "$f2")


and two files

    $ cat file1
    c c1
    b b1
    $ cat file2
    a a2
    c c2
    b b2

Why does the last command fail? Thanks.
    
    $ equijoin2 -o 2  -d " " file1 file2 1 1
    a a2
    b b1 b2
    c c1 c2
    $ equijoin2 -o 1  -d " " file1 file2 1 1
    b b1 b2
    c c1 c2
    $ equijoin2   -d " " file1 file2 1 1
    join: extra operand '/dev/fd/62'



                                

Tim (106420 rep)

Jul 24, 2018, 05:40 AM • Last activity: Aug 31, 2024, 05:07 AM

1 votes

2 answers

1709 views

Join command gives incorrect output?

solaris join

I'm trying to join two simple files in Solaris 5.8 Version as below: ~/temp/s: cat 1 work1 a 8058 51 work2 b 15336 51 ~/temp/s: cat 2 8058 77-11:29:32 /apps/sas 15336 100-12:23:49 /local/hotfix ~/temp/s: join -1 3 -2 1 1 2 8058 work1 a 51 77-11:29:32 /apps/sas (The other line is missing from the out...

                                  I'm trying to join two simple files in Solaris 5.8 Version as below:

    ~/temp/s: cat 1
    work1 a 8058 51
    work2 b 15336 51

    ~/temp/s: cat 2
    8058 77-11:29:32 /apps/sas
    15336 100-12:23:49 /local/hotfix

    ~/temp/s: join -1 3 -2 1 1 2
    8058 work1 a 51 77-11:29:32 /apps/sas (The other line is missing from the output)

The output only contains one record where it should be two. I'm really not sure where it went wrong.

Is there any way we may get all the records in the output?
                                

Avinash (413 rep)

Jul 31, 2013, 12:59 PM • Last activity: Aug 31, 2024, 04:57 AM

2 votes

2 answers

3953 views

Join problem: throwing error, join extra operand

text-processing join

I want to join 3 files on a column which has sorted unique numeric values (those files have only one column of values though) and starts with same prefix for an example "usi". Now, while I am doing this join -j 1 ../Test_Data/usi* > ../Test_Data/join_output.txt I am finding following error: > join:...

                                  I want to join 3 files on a column which has sorted unique numeric values (those files have only one column of values though) and starts with same prefix for an example "usi".

Now, while I am doing this

    join -j 1 ../Test_Data/usi* > ../Test_Data/join_output.txt

I am finding following error:

>     join: extra operand `usi_rtree_lw_100000.txt'
>     Try `join --help' for more information.

Any ideas?

N. F. (2269 rep)

Oct 22, 2012, 03:45 AM • Last activity: Aug 31, 2024, 04:55 AM

4 votes

3 answers

5478 views

How to sort and join at the same time?

text-processing sort columns join

I have 2 files, one having 2 columns, another having 1 column. The second file is sorted using `sort -u`. Now the task is I need to join this column with the first column of the first file, which is not sorted. So what will be the syntax? Will `join -j 1 file2.txt sort -s -n -k 1 file1.txt` work? Th...

                                  I have 2 files, one having 2 columns, another having 1 column.  
The second file is sorted using sort -u.  
Now the task is I need to join this column with the first column of the first file, which is not sorted.

So what will be the syntax? Will join -j 1 file2.txt sort -s -n -k 1 file1.txt work? 

The output I want is actually the 2nd column of file 2 after joining and the unique entries in it.

  File 2
  ------
    1
    2
    3

  File 1
  ------
    2  500
    1  5000 
    1  300
    3  3000 
    3  300
    4  450

  Output
  ------
    5000
    300
    500
    3000
    
                                

N. F. (2269 rep)

Sep 28, 2012, 08:19 AM • Last activity: Aug 31, 2024, 04:45 AM

3 votes

1 answers

647 views

The "join" utility reports: file is not sorted, but in fact it is sorted

text-processing text-formatting join

I have two files `t1` and `t2`. root@localhost:~# root@localhost:~# cat t1 udp UNCONN 0 0 0.0.0.0:68 0.0.0.0:* users:(("dhclient",pid=479,fd=7)) 479 tcp LISTEN 0 128 127.0.0.1:6060 0.0.0.0:* users:(("gggg-ruit",pid=24968,fd=5)) 24968 root@localhost:~# root@localhost:~# cat t2 root 88 0.0 0.0 0 0 ? I...

                                  I have two files t1 and t2. 

    root@localhost:~#
    root@localhost:~# cat t1
    udp  UNCONN  0  0    0.0.0.0:68      0.0.0.0:*  users:(("dhclient",pid=479,fd=7))     479
    tcp  LISTEN  0  128  127.0.0.1:6060  0.0.0.0:*  users:(("gggg-ruit",pid=24968,fd=5))  24968
    root@localhost:~#
    root@localhost:~# cat t2
    root        88  0.0  0.0      0     0 ?        I<   Jan06   0:00 [scsi_tmf_0]
    root        96  0.0  0.0      0     0 ?        I<   Jan06   0:00 [ipv6_addrconf]
    root     24965  0.0  0.2  11592  3004 ?        S    Jan12   0:00 bash /root/restart_gggg.sh
    root     24968  0.7  5.2 112488 53472 ?        Sl   Jan12  30:52 /usr/local/bin/gggg-ruit -singleInstance :44444

I want to join them on the 8th column of t1 and the 2nd column of t2. I already have them in sorted order. Let's prove it. 


    root@localhost:~# awk '{print $8}' t1
    479
    24968
    root@localhost:~# awk '{print $2}' t2
    88
    96
    24965
    24968

Now when I join them, I got the following error.

    root@localhost:~# join -1 8 -2 2 -o 2.1,2.2,1.1,1.2,1.5,1.6,2.11 t1 t2
    join: t2:3: is not sorted: root     24965  0.0  0.2  11592  3004 ?        S    Jan12   0:00 bash /root/restart_gggg.sh
    root@localhost:~#

Why it tells me t2 is not sorted on row 3? As you can see, it's been already sorted on the join column. 
                                

Just a learner (2022 rep)

Jan 15, 2020, 01:54 PM • Last activity: Aug 31, 2024, 04:35 AM

0 votes

2 answers

201 views

Sort is not sorting?

sort join

I seem to be having the same issue as described in https://unix.stackexchange.com/questions/562256/the-join-utility-reports-file-is-not-sorted-but-in-fact-it-is-sorted however I have piped BOTH files through `sort` before attempting to join. I have also tried `sort -d` and `sort -g`. This is running...

I seem to be having the same issue as described in https://unix.stackexchange.com/questions/562256/the-join-utility-reports-file-is-not-sorted-but-in-fact-it-is-sorted however I have piped BOTH files through sort before attempting to join. I have also tried sort -d and sort -g. This is running on Amazon Linux 2, using sort from coreutils-8.22-24 The following illustrates the issue:

root@host:/home/user# cat /tmp/db_schema_size | sort
directory       0.000106811523
directory_1      1.059814453265
directory_123    0.564987182688
directory_123123 0.564987182688
directory_1234   0.564987182688
directory_12345  0.564987182688
directory_1234567        0.564987182688
directory_82473  0.934677124123
directory_82475  0.751586914161
directory_82477  0.881881713968
directory_82479  0.751571655373
directory_82481  0.750396728614
directory_82483  0.589370727610

root@host:/home/user# cat /tmp/db_dir_sizes | sort
directory       132
directory_1      1115936
directory_123123 613244
directory_12345  613248
directory_1234567        613248
directory_1234   613244
directory_123    613244
directory_82473  1015140
directory_82475  818764
directory_82477  958628
directory_82479  818756
directory_82481  817500
directory_82483  638820

Both files are the same structure - no lead/trailing whitespace, a single tab char between the values. Both files are processed by sort but produce output in a different order. I do see that on Ubuntu 22.04 LTS, the output is consistent (and of the first form above). What am I missing here? **update** For clarification... On AWS Linux 2 with LANG=en_US.UTF-8, I get output as above - i.e. output differs On AWS Linux 2 with LANG=C.UTF-8 output is the same On Ubuntu with both LANG=C.UTF-8 and LANG=en_US.UTF-8, output is the same

symcbean (6301 rep)

Jun 19, 2024, 01:37 PM • Last activity: Aug 31, 2024, 04:32 AM

4 votes

3 answers

21776 views

Merging 2 files with based on field match

text-processing awk join

I want to create a file that contains columns from two input files. File1 is like: aa 32 bb 15 cc 78 File2 is: fa 19 bc 23 cc 50 de 28 aa 45 bb 31 The task is, read through File1, if the 1st field of a row exists among the 1st field of File2, then print that line of File2, with both columns and add...

I want to create a file that contains columns from two input files. File1 is like:

aa 32
bb 15
cc 78

File2 is:

fa 19
bc 23
cc 50
de 28
aa 45
bb 31

The task is, read through File1, if the 1st field of a row exists among the 1st field of File2, then print that line of File2, with both columns and add the 2nd column entry of File1 containing the 1st field. The output should be like:

aa 45 32
bb 31 15
cc 50 78

awk is preferred for the script.

hldn (41 rep)

Oct 8, 2014, 04:16 AM • Last activity: Mar 22, 2024, 10:06 AM

1 votes

0 answers

35 views

Problems with join on sorted key column but linux says "not in expected order"

linux join

File 1. Column1=old_id; Column2=snp_info (50 000 integers, so too large to show, containts 0,1,2,5 -> i.e., 020112202010511) ``` 80024979 80024987 80025141 80107980 80922131 81666414 81667586 87021127 87028460 2010112924 2010115513 2010186050 ``` File 2. Column1=old_id; Column2=new_id. ``` 79931168...

File 1. Column1=old_id; Column2=snp_info (50 000 integers, so too large to show, containts 0,1,2,5 -> i.e., 020112202010511)

File 2. Column1=old_id; Column2=new_id.

79931168 58155
79944336 58190
79969242 72833
80107980 58150
80922131 58109
2010112924 96821
2010115513 80604
2010186050 47254
2010198857 90190
2010229173 96927
2010229330 67548

I am trying to join on column 1 which was sorted using sort -k 1n file1 and the same for file2. When joining (in linux) I get the following error:

join: file 2 is not in expected order
join: file 1 is not in expected order.

If I look at the join, this is where it gives the error: ...200000010000000000000000 58150 join: file 2 is not in expected order -> but continues with the next old_id (file1) and their column2 value -> 80922131 1100000020000... ...001000010 58109 -> and then it stops here, that is again with column2 of file1 and column 2 of file 2 (the new_id column I want to put in). I looked at the values, it start giving errors and stop where in file2 column 1 the old_id goes from a value with 8 integers to values of 10 integers. Notice that it is indeed sorted correctly. How do I solve this? What is the cause? Pictures included. Thank you in advance. File 1, column1, (column2 not shown, but = 0101020222015... -> 50 000 integers per id)

File 2, column1=old_id, column2=new_id

Result1 where it continuous:

Result2 where it stops.

Any help will be immensely appreciated. Kind regards, Michiel.

Michiel Van Niekerk (41 rep)

Jan 7, 2024, 04:29 PM • Last activity: Jan 7, 2024, 04:39 PM

0 votes

0 answers

34 views

I want to join variables (columns) from two different files with a matching column using join - problems with output

linux output join

Good day. I am new to linux and really hope someone can help me. I have two files: File 1 : Column1=id_nr; Column2=snp_info [![Column1=id_nr; Column2=snp_info][1]][1] File 2 : Column1=id_nr; Column2=recoded_id_nr [![Column1=id_nr; Column2=recoded_id_nr][2]][2] I want to join the two files so that I...

                                  Good day.  I am new to linux and really hope someone can help me.

I have two files:

File 1 : Column1=id_nr;  Column2=snp_info

File 2 : Column1=id_nr;  Column2=recoded_id_nr

I want to join the two files so that I end up with a file with Column1=recoded_id_nr and Column2=snp_info for animals (id_nr) present in File 1 and File 2.  The id_nr are the matching variable and both files are sorted on it.

In linux I used the various forms of the join function, but get the same as the common one below:

    join file1 file2

Result:

It gives the correct animals that are present in both files, but the problem is with the first column.  The first integer is the correct recoded_id_nr, but it is fused (concatenated), without any space, with the last 8 digits of the id_nr of the 3 animals that are present in both files. Thus, it cuts of the first 2 digits of the id_nr and replace it with the recoded_id_nr.  The second column (snp_info) is fine.

Any help would be immensely appreciated.
Michiel

Michiel Van Niekerk (41 rep)

Jan 6, 2024, 04:46 PM

-1 votes

1 answers

64 views

Returning lines from a JSON log file which contain IP and timestamps read from a CSV file?

shell-script awk csv json join

so I'm new to Shell Scripting and could use some help. So in my CSV file, I want to see if IP and timestamp values exists in some way in the line entries of the JSON log file, and if so, return that specific JSON log entry to another file. I tried to make it universal so it's applicable to all IP ad...

                                  so I'm new to Shell Scripting and could use some help. So in my CSV file, I want to see if IP and timestamp values exists in some way in the line entries of the JSON log file, and if so, return that specific JSON log entry to another file. I tried to make it universal so it's applicable to all IP addresses. Here's what the sample CSV file would look like;

    "clientip,""destip","dest_hostname","timestamp"
     "127.0.0.1","0.0.0.0","randomhost","2023-09-09T04:18:22.542Z"

A sample line entry from the Json Log File 

    {"log": "09-Sept-2023 rate-limit: info: client @xyz 127.0.0.1, "stream":"stderr", "time": 2023-09-09T04:18:22.542Z"}

It's the lines from the JSON log file we want to return in the output.txt file when there's a match. The JSON file doesn't have the same fields and organization like the CSV does (with clientip, destip, dest_hostname, timestamp, but I was hoping that I could still at least return lines from the JSON log files to a new file that had matches on the clientip (like we see here with 127.0.0.1 in "info: client @xyz 127.0.0.1) and maybe the timestamp.

I've tried join file.csv xyz-json.log > output.txt
I've also tried the awk command to  no avail. I would really appreciate your help with this!

Linuxnoob (1 rep)

Nov 30, 2023, 02:11 AM • Last activity: Nov 30, 2023, 11:13 AM

9 votes

4 answers

15183 views

How to join a line with a pattern with the next line with sed?

text-processing sed join

I can't find this case in the board, so I'm asking the question. This is input file: module x(a,b,c) module y(d,e,f, g,h,i) module z(j,k,l) And output file should be: module x(a,b,c) module y(d,e,f, g,h,i) module z(j,k,l)

                                  I can't find this case in the board, so I'm asking the question.

This is input file:

    module  
    x(a,b,c)  
    module  
    y(d,e,f,  
    g,h,i)  
    module  
    z(j,k,l)

And output file should be:

    module x(a,b,c)  
    module y(d,e,f,  
    g,h,i)  
    module z(j,k,l)

                                

funfun (119 rep)

Dec 17, 2018, 06:24 AM • Last activity: Aug 19, 2023, 10:03 PM

0 votes

4 answers

1935 views

Comparing two .csv files and write into 3rd file

awk csv macos join

On my Mac I have two files: `file1.csv` (about 4000 rows) 1234,yxds 4352,fsfk 8765,fdgj 7874,hffe 9038,fjro 8297,fhjs and `file2.csv` (about 50 rows) 1234,"number","date","OS" 4352,"number","date","OS" 8765,"number","date","OS" 8297,"number","date","OS" file 3 should look like this: 1234,yxds 4352,f...

                                  On my Mac I have two files:

file1.csv (about 4000 rows)

    1234,yxds
    4352,fsfk
    8765,fdgj
    7874,hffe
    9038,fjro
    8297,fhjs
    

and file2.csv (about 50 rows)

    1234,"number","date","OS"
    4352,"number","date","OS"
    8765,"number","date","OS"
    8297,"number","date","OS"

file 3 should look like this:

    1234,yxds
    4352,fsfk
    8765,fdgj
    8297,fhjs

So I only want the entries that exist in file2.csv but with column 2 of file1.csv

Essentially I want to write a script compares file1.csv to a different file3.csv

So far I tried join and awk but with no success. This is what i tried so far :

    join -t, -12 -o 1.1,2.1,2.2 file1.csv file2.csv > file 3.csv

and

    awk 'NR==FNR{a[$1]=$2;next}{if ($1 in a && a[$1] != $2)print;}' FS="|" file1.csv file2.csv > file3.csv


                                

Sotey (1 rep)

Jun 11, 2019, 09:59 AM • Last activity: Jul 21, 2023, 07:02 AM

2 votes

3 answers

1681 views

How to do anti-join or inverse join in bash

text-processing join

I want to perform what some data analysis software call an anti-join: remove from one list those lines matching lines in another list. Here is some toy data and the expected output: ``` $ echo -e "a\nb\nc\nd" > list1 $ echo -e "c\nd\ne\nf" > list2 $ antijoincommand list1 list2 a b ```

I want to perform what some data analysis software call an anti-join: remove from one list those lines matching lines in another list. Here is some toy data and the expected output:

$ echo -e "a\nb\nc\nd" > list1
$ echo -e "c\nd\ne\nf" > list2
$ antijoincommand list1 list2
a
b

Josh (313 rep)

May 24, 2020, 01:28 PM • Last activity: Jul 12, 2023, 11:06 PM

-1 votes

1 answers

79 views

Why no one thought of the concept of joins before the 70s?

linux text-processing history join sql

So, let me see if I have my history right: - In 1976, the first system sold as an RDBMS was Multics Relational Data Store. I don't know if it had JOIN, but let's assume it did. - The UNIX command `join` was released in 1979. It imitates SQL's JOIN but it works on plain text files. So my question is....

                                  So, let me see if I have my history right:
- In 1976, the first system sold as an RDBMS was Multics Relational Data Store. I don't know if it had JOIN, but let's assume it did.
- The UNIX command join was released in 1979. It imitates SQL's JOIN but it works on plain text files.

So my question is. If join is relatively useful on plain text, why wasn't something like it developed before? Surely operating systems were dealing with plain text data and "tabular data" to some extent since the late 50s. And I'm sure at some point they would have found it useful to have a ready-made abstraction or tool to combine data from two different files in a meaningful way.

So, why didn't the concept emerge before? Are my rose-tinted glasses underestimating the cognitive overhead of joins, with all their Venn diagramness, that is only justifiable when the performance of a RDB is needed? Were people content with merging data in an ad-hoc way? Or was join indeed a stroke of genius?
                                

Sebastian Carlos (262 rep)

Jun 26, 2023, 04:09 PM • Last activity: Jun 26, 2023, 04:59 PM

1 votes

2 answers

3316 views

Join two csv files by matching columns, join command

csv join

I have two .csv files that I need to match based on column 1. The two file structures look like this. FILE1 gopAga1_00004004-RA,1122.825534, -2.497919969, 0.411529843 gopAga1_00010932-RA,440.485381, 1.769511316, 0.312853434 gopAga1_00007012-RA, 13.37565185, -1.973108929, 0.380227982 etc... FILE2 gop...

                                  I have two .csv files that I need to match based on column 1. 

The two file structures look like this.

FILE1

    gopAga1_00004004-RA,1122.825534,	-2.497919969,	0.411529843
    
    gopAga1_00010932-RA,440.485381,	1.769511316,	0.312853434	
    
    gopAga1_00007012-RA, 13.37565185,	-1.973108929,	0.380227982
    
    etc...

FILE2

    gopAga1_00004004-RA,	ENSACAP00000013845
    
    gopAga1_00009937-RA,	ENSACAP00000000905
    
    gopAga1_00010932-RA,	ENSACAP00000003279
    
    gopAga1_00000875-RA,	ENSACAP00000000296
    
    gopAga1_00010837-RA,	ENSACAP00000011919
    
    gopAga1_00007012-RA,	ENSACAP00000012682
    
    gopAga1_00017831-RA,	ENSACAP00000016147
    
    gopAga1_00005588-RA,	ENSACAP00000011117
    
    etc..

This is my current command that I am running using join:

This is formatted from what I have also read on the following threads here 

    join -1 1 -2 1 -t , -a 1 -e "NA" -o "2.2,1.1,1.2,1.3"  output.txt

However, every time I run this prompt it only writes the first row to output. 

Anyone know why my code is running like this and not actually merging the two files based on the GOP ID?

cdxun (11 rep)

Jun 26, 2018, 07:54 PM • Last activity: May 15, 2023, 03:09 AM

0 votes

1 answers

281 views

Merging two Unix files

text-processing join csv-simple

I have 2 pipe delimited files say file1 and file2. Where file1 has say 33 columns and file2 may have 34/35/36 columns. So let's not consider how many columns we have. What I want to do is compare the values in file1 & file2 (from column 1 till column 32). If all values are same then take the values...

                                  I have 2 pipe delimited files say file1 and file2.
Where file1 has say 33 columns and file2 may have 34/35/36 columns. So let's not consider how many columns we have.  
What I want to do is compare the values in file1 & file2 (from column 1 till column 32). If all values are same then take the values from file2 and append for all same records in file1.  
Say for 1st records in file2 have 5 matches in file1 the take value "|84569|21.5|1" and append it to all matches on file1 (see file3 for expected results).
Similarly, for 2nd record in file2 we have 5 matches in file1, so take the value "|0" and append it to all matched records in file1.
Same goes with 3rd record from file2. There are 3 matches so take value "|21457879|12.4" and append it on all 3 matched rows in file1

If you are thinking how are we selecting from where to take values from file2 for appending in file1 then we should take it from column no 34. Though the start position is fixed but end position is not.
Like if you in example "a" the we have taken values from col 34/35/36 but for "b" we have just col 34. however for "c" we have values in col 34/35.

I do not know how to format the data in my examples below. So giving it as it is.

file1

    a|a1|a2|a3|a4|...|a32|acb@sma.com
    a|a1|a2|a3|a4|...|a32|acd@sm.com$1553:2015-02-14 
    a|a1|a2|a3|a4|...|a32|axwer@xi.com30:2015-03-01 
    a|a1|a2|a3|a4|...|a32|acbw@ma.com$121:2015-01-31 
    a|a1|a2|a3|a4|...|a32|art@ma.com$293:2015-02-28 
    b|b1|b2|b3|b4|...|b32|asmi@g.in$542:2013:05:24 
    b|b1|b2|b3|b4|...|b32|kasmi@g.in$542:2013:05:24 
    b|b1|b2|b3|b4|...|b32|asmi@g.in14:2013:05:24 
    b|b1|b2|b3|b4|...|b32|asmi@g.in$542:2013:05:24 
    b|b1|b2|b3|b4|...|b32|asmi@g.in232:2014:05:24 
    c|c1|c2|c3|c4|...|c32|Asce@ita.in 
    c|c1|c2|c3|c4|...|c32|$200:2011:12:06 
    c|c1|c2|c3|c4|...|c32|kst@gre.in$214:2001:01:31 

file2

    a|a1|a2|a3|a4|...|a32|acb@sma.com|84569|21.5|1 
    b|b1|b2|b3|b4|...|b32|asmi@g.in$542:2013:05:24|0 
    c|c1|c2|c3|c4|...|c32|Asce@ita.in|21457879|12.4 


Expected File: File3

    a|a1|a2|a3|a4|...|a32|acb@sma.com|84569|21.5|1 
    a|a1|a2|a3|a4|...|a32|acd@sm.com$1553:2015-02-14|84569|21.5|1 
    a|a1|a2|a3|a4|...|a32|axwer@xi.com30:2015-03-01|84569|21.5|1 
    a|a1|a2|a3|a4|...|a32|acbw@ma.com$121:2015-01-31|84569|21.5|1 
    a|a1|a2|a3|a4|...|a32|art@ma.com$293:2015-02-28|84569|21.5|1 
    b|b1|b2|b3|b4|...|b32|asmi@g.in$542:2013:05:24|0 
    b|b1|b2|b3|b4|...|b32|kasmi@g.in$542:2013:05:24|0 
    b|b1|b2|b3|b4|...|b32|asmi@g.in14:2013:05:24|0 
    b|b1|b2|b3|b4|...|b32|asmi@g.in$542:2013:05:24|0 
    b|b1|b2|b3|b4|...|b32|asmi@g.in232:2014:05:24|0 
    c|c1|c2|c3|c4|...|c32|Asce@ita.in|21457879|12.4 
    c|c1|c2|c3|c4|...|c32|$200:2011:12:06|21457879|12.4 
    c|c1|c2|c3|c4|...|c32|kst@gre.in$214:2001:01:31|21457879|12.4 








                                

mdx (11 rep)

Apr 24, 2015, 10:44 AM • Last activity: Jan 5, 2023, 08:19 AM

0 votes

1 answers

937 views

Sort and Join with letters and numbers

sort join

I have two files (file1 and file2) and I would like to mount a third (file3) containing a column that the two do not have. I usually do the sort and then together with the join. But it is not working, because it has values with letters and numbers from the example below: File1: ARS-BFGL-BAC-1180 14...

                                  I have two files (file1 and file2) and I would like to mount a third (file3) containing a column that the two do not have. I usually do the sort and then together with the join.
   But it is not working, because it has values with letters and numbers from the example below:

File1:

    ARS-BFGL-BAC-1180 14 20574088
    ARS-BFGL-BAC-11805 12 14897445
    ARS-BFGL-BAC-11852 1 34309921
    ARS-BFGL-BAC-11867 13 12616676 

File2 

    AC16359 ARS-BFGL-BAC-11798 B B
    AC16359 ARS-BFGL-BAC-11805 B B
    AC16359 ARS-BFGL-BAC-1180 B B
    AC16359 ARS-BFGL-BAC-11852 A A
    AC16359 ARS-BFGL-BAC-11867 B B 

I notice that I use the same sort command for both and note that the final SNP_Name 11805 is above 1180 and the other is not.

Augusto II (1 rep)

Apr 29, 2021, 01:56 PM • Last activity: Jan 5, 2023, 08:00 AM

0 votes

2 answers

1879 views

Merge all csv files in directory column wise

shell-script shell awk join paste

Say the directory contained 3 csv files: The first csv: ``` Name, John Age, 18 ``` The second csv: ``` Name, Jim Age, 21 ``` The third csv: ``` Name, Amy Age, 22 ``` I would want the result to be: ``` Name, John, Jim, Amy Age, 18, 21, 22 ``` It's important to know the directory could have n many csv...

Say the directory contained 3 csv files: The first csv:

Name, John
Age, 18

The second csv:

Name, Jim
Age, 21

The third csv:

Name, Amy
Age, 22

I would want the result to be:

Name, John, Jim, Amy
Age, 18, 21, 22

It's important to know the directory could have n many csvs I have both bash and posix shell available Edit: This feels like it should work but still has an issue with regards to order: awk -F, -v OFS="," '{a[FNR]=a[FNR]?a[FNR]FS$2:$1FS$2}END{for(x in a)print x,a[x]}' *.csv > results.csv Which makes no sense as FNR 1 should be first in the array but it is printed last?

Tap (35 rep)

Apr 20, 2022, 05:22 PM • Last activity: Nov 23, 2022, 10:28 AM

Showing page 1 of 20 total questions