Deduplicate CSV rows based on a specific column, with a CSV parser

7 votes

2 answers

515 views

                          I searched for this task, and found the following older questions:

 - https://unix.stackexchange.com/questions/681059/removing-duplicates-from-a-csv-based-on-specified-columns 
 - https://unix.stackexchange.com/questions/444476/identify-unique-records-on-csv-based-on-specific-columns 

But I can't use awk because my data is a complex CSV file with multiple nested double quotes.

Let's say I want to deduplicate the following (simplified case):

    Ref,xxx,zzz
    ref1,"foo, bar, base",qux
    ref1,"foo, bar, base",bar
    ref2,aaa,bbb

In the output I need it as follows:

    Ref,xxx,zzz
    ref1,"foo, bar, base",qux
    ref2,aaa,bbb

No awk solution, please, only with any CSV parser.

I tried the following:

    mlr --csv uniq -a -g Ref file.csv

But it's an error.
                        

Asked by Mévatlavé Kraspek (541 rep)

May 29, 2023, 01:34 AM
Last activity: May 29, 2023, 06:50 PM

Deduplicate CSV rows based on a specific column, with a CSV parser

Related Questions