Sample Header Ad - 728x90

Removing duplicate values based on two columns

0 votes
0 answers
189 views
I have a file that would like to filter duplicate values based column 1 and 6 ID,sample,NAME,reference,app_name,appession_id,workflow,execution_status,status,date_created 1,ABC,XYZ,DOP,2022-08-18 13:31:09Z,28997974,same,Complete,PASS,18/08/2022 1,ABC,XYZ,DOP,2022-08-18 13:31:09Z,28997974,same,Complete,PASS,18/08/2022 1,ABC,XYZ,DOP,2022-08-18 13:31:09Z,28997974,same,Complete,PASS,18/08/2022 1,ABC,XYZ,DOP,2022-08-18 13:31:09Z,28997974,same,Complete,PASS,18/08/2022 2,ABC,XYZ,DOP,2022-08-18 13:31:09Z,28997974,same,Complete,PASS,18/08/2022 2,ABC,XYZ,DOP,2022-08-18 13:31:09Z,28997974,same,Complete,PASS,18/08/2022 2,ABC,XYZ,DOP,2022-08-18 13:31:09Z,28997974,same,Complete,PASS,18/08/2022 2,ABC,XYZ,DOP,2022-08-18 13:31:09Z,28997974,same,Complete,PASS,18/08/2022 and the final output should look like ID,sample,NAME,reference,app_name,appession_id,workflow,execution_status,status,date_created 1,ABC,XYZ,DOP,2022-08-18 13:31:09Z,28997974,same,Complete,PASS,18/08/2022 2,ABC,XYZ,DOP,2022-08-18 13:31:09Z,28997974,same,Complete,PASS,18/08/2022 So far this is what I have tried awk '!a[$1 $6]++ { print ;}' input.csv > output.csv I end up with ID,sample,NAME,reference,app_name,appession_id,workflow,execution_status,status,date_created 1,ABC,XYZ,DOP,2022-08-18 13:31:09Z,28997974,same,Complete,PASS,18/08/2022 2,ABC,XYZ,DOP,2022-08-18 13:31:09Z,28997974,same,Complete,PASS,18/08/2022 2,ABC,XYZ,DOP,2022-08-18 13:31:09Z,28997974,same,Complete,PASS,18/08/2022 Any suggestion would be helpful. Thank you
Asked by nbn (113 rep)
Oct 14, 2022, 03:59 PM
Last activity: Oct 17, 2022, 07:58 AM