Sample Header Ad - 728x90

Deduplicate CSV rows based on a specific column, with a CSV parser

7 votes
2 answers
515 views
I searched for this task, and found the following older questions: - https://unix.stackexchange.com/questions/681059/removing-duplicates-from-a-csv-based-on-specified-columns - https://unix.stackexchange.com/questions/444476/identify-unique-records-on-csv-based-on-specific-columns But I can't use awk because my data is a complex CSV file with multiple nested double quotes. Let's say I want to deduplicate the following (simplified case): Ref,xxx,zzz ref1,"foo, bar, base",qux ref1,"foo, bar, base",bar ref2,aaa,bbb In the output I need it as follows: Ref,xxx,zzz ref1,"foo, bar, base",qux ref2,aaa,bbb No awk solution, please, only with any CSV parser. I tried the following: mlr --csv uniq -a -g Ref file.csv But it's an error.
Asked by Mévatlavé Kraspek (541 rep)
May 29, 2023, 01:34 AM
Last activity: May 29, 2023, 06:50 PM