Deduplicate CSV rows based on a specific column, with a CSV parser
7
votes
2
answers
515
views
I searched for this task, and found the following older questions:
- https://unix.stackexchange.com/questions/681059/removing-duplicates-from-a-csv-based-on-specified-columns
- https://unix.stackexchange.com/questions/444476/identify-unique-records-on-csv-based-on-specific-columns
But I can't use
awk
because my data is a complex CSV file with multiple nested double quotes.
Let's say I want to deduplicate the following (simplified case):
Ref,xxx,zzz
ref1,"foo, bar, base",qux
ref1,"foo, bar, base",bar
ref2,aaa,bbb
In the output I need it as follows:
Ref,xxx,zzz
ref1,"foo, bar, base",qux
ref2,aaa,bbb
No awk
solution, please, only with any CSV parser.
I tried the following:
mlr --csv uniq -a -g Ref file.csv
But it's an error.
Asked by Mévatlavé Kraspek
(541 rep)
May 29, 2023, 01:34 AM
Last activity: May 29, 2023, 06:50 PM
Last activity: May 29, 2023, 06:50 PM