Remove duplicates of specific line keeping only the first appearance of each without touching other unspecified duplicates
1
vote
2
answers
981
views
I'm trying to edit a text file containing several duplicates. The goal is to keep only the first match of a string and remove the rest duplicate lines of the same string.
In the example file
* Title 1
** Subtitle 01
#+begin_src
Line 001
Line 002
#+end_src
* Title 1
** Subtitle 02
#+begin_src
Line 001
Line 002
#+end_src
* Title 2
** Subtitle 01
#+begin_src
Line 001
Line 002
#+end_src
* Title 2
** Subtitle 02
#+begin_src
Line 001
Line 002
#+end_src
I'd like to keep one of each * Title N
and *keep all other unrelated/unspecified duplicate lines* on the file.
So the result would be:
* Title 1
** Subtitle 01
#+begin_src
Line 001
Line 002
#+end_src
** Subtitle 02
#+begin_src
Line 001
Line 002
#+end_src
* Title 2
** Subtitle 01
#+begin_src
Line 001
Line 002
#+end_src
** Subtitle 02
#+begin_src
Line 001
Line 002
#+end_src
The traditional solutions for removing duplicates like
uniq file.txt
[Useful AWK One-Liners to Keep Handy](https://linoxide.com/useful-awk-one-liners-to-keep-handy/) :
awk '!a[$0]++' contents.txt
[shell - How to delete duplicate lines in a file without sorting it in Unix - Stack Overflow](https://stackoverflow.com/questions/1444406/how-to-delete-duplicate-lines-in-a-file-without-sorting-it-in-unix/32513573#32513573)
perl -ne 'print if ! $x{$_}++' file
delete every duplicate indiscriminately.
I tried using variations of these solutions and also GNU
in a loop format like
duplicateLines=$(grep -E "^\* .*" file.org | uniq)
printf '%s\n' "$duplicateLines" | while read -r line; do
sed "s/$line//g2" file.org
done
with no success. I don't mind absolute performance so doing multiple iterations like calling
inside a loop to remove one specified string at a time is no problem.
Any insight would be very much appreciated.
It would be nice to be able to do this inside a shell script but I'm open to alternative solutions like Python, C, Java, etc., just tell me what the function/library name is and I'm searching for it there.
Thanks.
Asked by yeyin33455
(13 rep)
Dec 30, 2021, 01:40 AM
Last activity: Jan 2, 2022, 12:44 AM
Last activity: Jan 2, 2022, 12:44 AM