Unix & Linux Stack Exchange

Q&A for users of Linux, FreeBSD and other Unix-like operating systems

Latest Questions

2 votes

1 answers

1989 views

ZSH - PATH Duplication : Directory added at end of PATH keeps duplicating when re-opening Terminal Session

I have recently installed PIPX on MAC running Big Sur and ZSH shell. During the install it prompted for the following to be added to the `.zshrc` file.... # Created by `pipx` on 2021-03-20 14:22:23 export PATH="$PATH:/Users/xxxx/.local/bin" eval "$(register-python-argcomplete pipx)" Running echo `$P...

                                  I have recently installed PIPX on MAC running Big Sur and ZSH shell. During the install it prompted for the following to be added to the .zshrc file....

    # Created by pipx on 2021-03-20 14:22:23
    export PATH="$PATH:/Users/xxxx/.local/bin"
    eval "$(register-python-argcomplete pipx)"

Running echo $PATH showed /Users/xxxx/.local/bin added to the end of my PATH variable. However, when I close the terminal and open up a new session, running echo $PATH now shows the location duplicated at the end of the PATH :/Users/xxxx/.local/bin:/Users/xxxx/.local/bin

Opening and closing new terminal sessions doesn't seem to create any more additions to PATH it just remains at these 2 entries....

I have run typeset -U PATH path to remove the duplicate but each time I open up new terminal sessions it just duplicates again.

Does anybody know how I can stop this from happening.....I would really like to keep my PATH variable as clean as possible.

KB_cov (29 rep)

Mar 21, 2021, 03:27 PM • Last activity: Apr 25, 2025, 01:03 AM

0 votes

1 answers

400 views

iptables duplicate port traffic

linux iptables udp duplicate

I want to clone/duplicate all udp traffic incoming on port 8500 to port 8600. It is important that the source address is not modified. Also both ports must be accessible by applications (the packets must still arrive on the original port). This solution (https://unix.stackexchange.com/questions/7048...

                                  I want to clone/duplicate all udp traffic incoming on port 8500 to port 8600. It is important that the source address is not modified. Also both ports must be accessible by applications (the packets must still arrive on the original port).

This solution (https://unix.stackexchange.com/questions/704887/nftables-duplicate-udp-packets-for-specific-destination-ipport-to-a-second-d)  does work on a newer system, unfortunately the machine in question is running kernel 3.10 on RHEL 7 and I am not allowed to update it.

mirokai (43 rep)

Apr 17, 2024, 10:34 AM • Last activity: Apr 17, 2024, 04:34 PM

3 votes

2 answers

2147 views

Duplicate stdin to stdout and stderr, but in a synchronized way

synchronization stdout stdin stderr duplicate

I need to duplicate the stdout of a producer and feed it to two consumer in a **synchronized** fashion. consumer 1 producer | duplicator | consumer 2 This can easily be accomplished for example via `tee`: ```bash ((cat f.txt | tee /dev/stderr | ./cons1.py >&3) 2>&1 | ./cons2.py) 3>&1 ``` or via name...

((cat f.txt | tee /dev/stderr | ./cons1.py >&3) 2>&1 | ./cons2.py) 3>&1

or via named pipes:

mkfifo fifo1 fifo2
cat f.txt | tee fifo1 fifo2 >/dev/null &

int main()
{
    char *line = NULL;
    size_t size;
    while (getline(&line, &size, stdin) != -1) {
        fprintf(stdout, "%s", line);
        fprintf(stderr, "%s", line);
    }
    return 0;
}

and then:

((cat f.txt | ./dup | ./cons1.py >&3) 2>&1 | ./cons2.py) 3>&1

However, **if consumer 1 is faster than consumer 2 we have a problem**. E.g., consumer 1 is already at line 50,000 while consumer 2 is at line 17,000. For my system **I need that both consumers are at the same line, hence the faster consumer needs to be restricted**. I know that this might be impossible via Linux standard tools. However, at least if we use that dup.c approach, it should be somehow possible. Any suggestions how to accomplish this? Thanks!

m33x (33 rep)

Sep 6, 2015, 12:06 PM • Last activity: Dec 14, 2023, 06:31 PM

0 votes

0 answers

25 views

Linux home dir has a "duplicate" under /home/music but when anything is deleted from one it disappears from both -- how can I delete 2nd home dir?

home delete duplicate

I don't know where the 2nd "phantom" home dir came from. Properties folder shows it's 4K. The "real" home directory is 128,000+ files and 208GB. User10489 had the answer: a symlink directory. In fact, using the ls -l he suggested, I found a second directory that was also a symlink. I used rm to remo...

                                  I don't know where the 2nd "phantom" home dir came from. Properties folder shows it's 4K. The "real" home directory is 128,000+ files and 208GB.  

User10489 had the answer: a symlink directory. In fact, using the ls -l he suggested, I found a second directory that was also a symlink. I used rm to remove both. Thank you.

R E Brinson (1 rep)

Dec 2, 2023, 03:06 AM • Last activity: Dec 2, 2023, 09:59 PM

8 votes

7 answers

19353 views

How to find and delete duplicate files within the same directory?

shell-script files find duplicate

I want to find duplicate files, within a directory, and then delete all but one, to reclaim space. How do I achieve this using a shell script? For example: pwd folder Files in it are: log.bkp log extract.bkp extract I need to compare log.bkp with all the other files and if a duplicate file is found...

                                  I want to find duplicate files, within a directory, and then delete all but one, to reclaim space. How do I achieve this using a shell script?

For example:

    pwd
    folder

Files in it are:

    log.bkp
    log
    extract.bkp
    extract

I need to compare log.bkp with all the other files and if a duplicate file is found (by it's content), I need to delete it. Similarly, file 'log' has to be checked with all other files, that follow, and so on.

So far, I have  written this, But it's not giving desired result.

    #!/usr/bin/env ksh
    count=ls -ltrh /folder | grep '^-'|wc -l
    for i in /folder/*
    do
       for (( j=i+1; j<=count; j++ ))
       do
          echo "Current two files are $i and $j"
          sdiff -s $i  $j
          if [ echo $? -eq  0 ]
          then
             echo "Contents of $i and $j are same"
           fi
        done
     done


                                

Su_scriptingbee (319 rep)

May 28, 2017, 02:41 PM • Last activity: Oct 24, 2023, 04:15 PM

0 votes

3 answers

34 views

Filter duplicated based on values of another column

linux filter duplicate

I have the following example of a dataframe.Where you see that elements of 3nd column could be duplicated.I want to keep the entry which has the highest value in column 5 Meaning that for **AGCCCGGGG** I want to keep the second entry which the 5th column has the value of 49. A00643:620:HFM7YDSX5:1:1...

                                  I have the following example of a dataframe.Where you see that elements of 3nd column could be duplicated.I want to keep the entry which has the highest value in column 5

  Meaning that  for **AGCCCGGGG** I want to keep the second entry which the 5th column has the value of 49.

    A00643:620:HFM7YDSX5:1:1124:7120:12352	ATCAGCCCGGGGCTTGGGCTAGGAC	GGGTGTGTG	548476  0	Corynebacterium
    A00643:620:HFM7YDSX5:1:1150:15953:12524	CCTATCGTCGCTGGAATTCCCCGGG	AGCCCGGGG	1458266	1	Bordetella
    A00643:620:HFM7YDSX5:1:1150:15628:12743	CCTATCGTCGCTGGAATTCCCCGGG	AGCCCGGGG	1458266	49	Bordetella
    A00643:620:HFM7YDSX5:1:1450:4001:4507	GGCGATCGAAATGTCAAGCCCGGGG	TCTTGTGGT	585529	0	Corynebacterium
    A00643:620:HFM7YDSX5:1:2124:8865:2472	ATCAGCCCGGGGCTTGGGCTAGGAC	GGGTGTGTG	548476	0	Corynebacterium
    A00643:620:HFM7YDSX5:1:2476:4001:29496	ATTCACCCTATAGGAGCCCGGGGCA	TGCCCCGGG	1458266	0	Bordetella


                                

Anna Antonatou -Pappaioannou (1 rep)

May 15, 2023, 10:32 AM • Last activity: Oct 20, 2023, 09:45 AM

0 votes

3 answers

106 views

Need assistance with awk/sed to identify/mark duplicate IP addresses

linux awk sed ipv6 duplicate

Good day. I have a text file which contains pod/node names and associated IPv6 addresses of which two pods have the same IP address, first pod **k8-worker0001c-cif-9d86d6dd4-vf9b9** and last pod **k8-worker0001c-ctdc-5bc95b699f-xnmrn**, the IP address being **2001:1890:e00f:3900::6** ```$ cat /tmp/d...

$ cat /tmp/dup_ip.txt
k8-worker0001c-cif-9d86d6dd4-vf9b9
         2001:1890:e00f:3900::4/64 global nodad
         2001:1890:e00f:3900::6/64 global 

k8-worker0001c-cifpartner-64c89f8bc8-8p5pq
         2001:1890:e00f:3900::10/64 global 

k8-worker0001c-ctd-7d759784ff-2gk5d
         2001:1890:e00f:3900::a/64 global nodad
         2001:1890:e00f:3900::d/64 global 

k8-worker0001c-ctd-7d759784ff-hd8jp
         2001:1890:e00f:3900::c/64 global 

k8-worker0001c-ctd-7d759784ff-qkk4t
         2001:1890:e00f:3900::8/64 global nodad
         2001:1890:e00f:3900::f/64 global 

k8-worker0001c-ctd-7d759784ff-t6lwz
         2001:1890:e00f:3900::5/64 global 

k8-worker0001c-ctd-7d759784ff-vl8x9
         2001:1890:e00f:3900::9/64 global nodad
         2001:1890:e00f:3900::b/64 global 

k8-worker0001c-ctdc-5bc95b699f-xnmrn
         2001:1890:e00f:3900::7/64 global nodad
         2001:1890:e00f:3900::6/64 global

All I need is a one-liner to identify the duplicate IP address while retaining the rest, including the pod names. I have tried using **awk** **!seen** but that deletes the duplicate which I don't want. Therefore I'd like something like this:

$ cat /tmp/dup_ip.txt
k8-worker0001c-cif-9d86d6dd4-vf9b9
         2001:1890:e00f:3900::4/64 global nodad
         2001:1890:e00f:3900::6/64 global          DUPLICATE!

k8-worker0001c-cifpartner-64c89f8bc8-8p5pq
         2001:1890:e00f:3900::10/64 global 

k8-worker0001c-ctd-7d759784ff-2gk5d
         2001:1890:e00f:3900::a/64 global nodad
         2001:1890:e00f:3900::d/64 global 

k8-worker0001c-ctd-7d759784ff-hd8jp
         2001:1890:e00f:3900::c/64 global 

k8-worker0001c-ctd-7d759784ff-qkk4t
         2001:1890:e00f:3900::8/64 global nodad
         2001:1890:e00f:3900::f/64 global 

k8-worker0001c-ctd-7d759784ff-t6lwz
         2001:1890:e00f:3900::5/64 global 

k8-worker0001c-ctd-7d759784ff-vl8x9
         2001:1890:e00f:3900::9/64 global nodad
         2001:1890:e00f:3900::b/64 global 

k8-worker0001c-ctdc-5bc95b699f-xnmrn
         2001:1890:e00f:3900::7/64 global nodad
         2001:1890:e00f:3900::6/64 global         DUPLICATE!

Thanks in advance, Bjoern

Bjoern (59 rep)

May 8, 2023, 07:16 PM • Last activity: May 9, 2023, 07:54 PM

0 votes

2 answers

146 views

awk is automatically duplicating some lines. Can someone explain?

awk duplicate

My data looks like: A 4 G 1 G 1 C 4 C 2 C 2 T 6 T 5 T 5 A 6 T 2 T 2 C 6 T 2 T 2 T 6 G 2 G 2 I am trying the command: awk -F " " '$1==$3 {$7=$6; print $0;} $1==$5 {$7=$4; print $0;} ($1 != $3 && $1 != $5) {$7=$2; print $0}' test.txt While the data has only 5 lines the output has 7 lines and certain l...

                                  My data looks like:

    A 4 G 1 G 1
    C 4 C 2 C 2
    T 6 T 5 T 5
    A 6 T 2 T 2
    C 6 T 2 T 2
    T 6 G 2 G 2

I am trying the command:

    awk -F " " '$1==$3 {$7=$6; print $0;}
                $1==$5 {$7=$4; print $0;}
                ($1 != $3 && $1 != $5) {$7=$2; print $0}' test.txt

While the data has only 5 lines the output has 7 lines and certain lines are randomly duplicated.

Somehow it happens with only this dataset and not the other datasets that I have.
Can someone please help. I don't understand what is happening
                                

user563991 (9 rep)

Mar 3, 2023, 03:28 PM • Last activity: Mar 3, 2023, 03:50 PM

1 votes

1 answers

1171 views

Find duplicate IPs for different MACs

linux ip macintosh arp duplicate

Using arp-scan to get a list of returned duplicate IP address. However, arp-scan will list duplicate IP with the same MAC address. I get a sorted output of asx.txt (shortened for brevity) ~~~ arp-scan 172.16.0.0/16 > as.txt sort as.txt > as2.txt cat as2.txt | uniq -D -w 36 > asx.txt kye-mgmt02:/data...

                                  Using arp-scan to get a list of returned duplicate IP address. However, arp-scan will list duplicate IP with the same MAC address. I get a sorted output of asx.txt (shortened for brevity)

~~~
arp-scan 172.16.0.0/16 > as.txt
sort as.txt > as2.txt
cat as2.txt | uniq -D -w 36 > asx.txt
kye-mgmt02:/data # cat asx.txt
  172.16.150.68   d8:cb:8a:b0:6a:12       Micro-Star INTL CO., LTD.
  172.16.150.68   d8:cb:8a:b0:6a:12       Micro-Star INTL CO., LTD. (DUP: 2)
  172.16.150.69   00:23:24:9e:3d:32       G-PRO COMPUTER
  172.16.150.69   00:23:24:9e:3d:32       G-PRO COMPUTER (DUP: 2)
  172.16.150.70   00:23:24:9e:3d:82       G-PRO COMPUTER
  172.16.150.70   00:23:24:9e:3d:82       G-PRO COMPUTER (DUP: 2)
  172.16.150.71   d8:cb:8a:86:2f:56       Micro-Star INTL CO., LTD.
  172.16.150.71   d8:cb:8a:86:2f:56       Micro-Star INTL CO., LTD. (DUP: 2)
  172.16.150.72   d8:cb:8a:cf:f1:e8       Micro-Star INTL CO., LTD.
  172.16.150.72   d8:cb:8a:cf:f1:e8       Micro-Star INTL CO., LTD. (DUP: 2)
  172.16.150.73   d8:cb:8a:cf:f1:5d       Micro-Star INTL CO., LTD.
  172.16.150.73   d8:cb:8a:cf:f1:5d       Micro-Star INTL CO., LTD. (DUP: 2)
~~~
So as you can see, all the IPs are really not duplicated because the IPs have the same MAC address

to really find a duplicate IP with a different MAC, I edited the file and change the MAC of the last IP. 
~~~
kye-mgmt02:/data # cat asx.txt
  172.16.150.68   d8:cb:8a:b0:6a:12       Micro-Star INTL CO., LTD.
  172.16.150.68   d8:cb:8a:b0:6a:12       Micro-Star INTL CO., LTD. (DUP: 2)
  172.16.150.69   00:23:24:9e:3d:32       G-PRO COMPUTER
  172.16.150.69   00:23:24:9e:3d:32       G-PRO COMPUTER (DUP: 2)
  172.16.150.70   00:23:24:9e:3d:82       G-PRO COMPUTER
  172.16.150.70   00:23:24:9e:3d:82       G-PRO COMPUTER (DUP: 2)
  172.16.150.71   d8:cb:8a:86:2f:56       Micro-Star INTL CO., LTD.
  172.16.150.71   d8:cb:8a:86:2f:56       Micro-Star INTL CO., LTD. (DUP: 2)
  172.16.150.72   d8:cb:8a:cf:f1:e8       Micro-Star INTL CO., LTD.
  172.16.150.72   d8:cb:8a:cf:f1:e8       Micro-Star INTL CO., LTD. (DUP: 2)
  172.16.150.73   d8:cb:8a:cf:f1:5d       Micro-Star INTL CO., LTD.
  172.16.150.73   d8:cb:8a:cf:f1:55       Micro-Star INTL CO., LTD. (DUP: 2)
~~~
Looking on how to output the duplicate IPs with different MACs

expected output
~~~
  172.16.150.73   d8:cb:8a:cf:f1:5d       Micro-Star INTL CO., LTD.
  172.16.150.73   d8:cb:8a:cf:f1:55       Micro-Star INTL CO., LTD. (DUP: 2)
~~~
I can't seem to find the right options to output the duplicate IPs with different MACs

Help please.

---
**tried
 
    cat asx.txt | uniq -D -s 15 -w 33
    cat asx.txt | uniq -D -s 15 -w 17-33
    cat asx.txt | uniq -D -f1 -w 33
    cat asx.txt | uniq -D -f1 -w 32
    cat asx.txt | uniq -D -f1 -w 31
    cat asx.txt | uniq -D -f1 -w 30
    cat asx.txt | uniq -D -f1
    cat asx.txt | uniq -D -s 15

But none gives the desired output.

                                

user552826

Dec 12, 2022, 03:58 PM • Last activity: Dec 12, 2022, 06:18 PM

0 votes

0 answers

189 views

Removing duplicate values based on two columns

awk columns duplicate

I have a file that would like to filter duplicate values based column 1 and 6 ID,sample,NAME,reference,app_name,appession_id,workflow,execution_status,status,date_created 1,ABC,XYZ,DOP,2022-08-18 13:31:09Z,28997974,same,Complete,PASS,18/08/2022 1,ABC,XYZ,DOP,2022-08-18 13:31:09Z,28997974,same,Comple...

                                  I have a file that would like to filter duplicate values based column 1 and 6

    ID,sample,NAME,reference,app_name,appession_id,workflow,execution_status,status,date_created
    1,ABC,XYZ,DOP,2022-08-18 13:31:09Z,28997974,same,Complete,PASS,18/08/2022
    1,ABC,XYZ,DOP,2022-08-18 13:31:09Z,28997974,same,Complete,PASS,18/08/2022
    1,ABC,XYZ,DOP,2022-08-18 13:31:09Z,28997974,same,Complete,PASS,18/08/2022
    1,ABC,XYZ,DOP,2022-08-18 13:31:09Z,28997974,same,Complete,PASS,18/08/2022
    2,ABC,XYZ,DOP,2022-08-18 13:31:09Z,28997974,same,Complete,PASS,18/08/2022
    2,ABC,XYZ,DOP,2022-08-18 13:31:09Z,28997974,same,Complete,PASS,18/08/2022
    2,ABC,XYZ,DOP,2022-08-18 13:31:09Z,28997974,same,Complete,PASS,18/08/2022
    2,ABC,XYZ,DOP,2022-08-18 13:31:09Z,28997974,same,Complete,PASS,18/08/2022

and the final output should look like


    ID,sample,NAME,reference,app_name,appession_id,workflow,execution_status,status,date_created
    1,ABC,XYZ,DOP,2022-08-18 13:31:09Z,28997974,same,Complete,PASS,18/08/2022
    2,ABC,XYZ,DOP,2022-08-18 13:31:09Z,28997974,same,Complete,PASS,18/08/2022

So far this is what I have tried

    awk '!a[$1 $6]++ { print ;}' input.csv > output.csv

I end up with 

    ID,sample,NAME,reference,app_name,appession_id,workflow,execution_status,status,date_created
    1,ABC,XYZ,DOP,2022-08-18 13:31:09Z,28997974,same,Complete,PASS,18/08/2022
    2,ABC,XYZ,DOP,2022-08-18 13:31:09Z,28997974,same,Complete,PASS,18/08/2022
    2,ABC,XYZ,DOP,2022-08-18 13:31:09Z,28997974,same,Complete,PASS,18/08/2022

Any suggestion would be helpful. Thank you
                                

nbn (113 rep)

Oct 14, 2022, 03:59 PM • Last activity: Oct 17, 2022, 07:58 AM

1 votes

1 answers

347 views

How to sort a list of strings that contain a combination of letters and numbers

sort delete duplicate

I want to sort following strings by the number and remove duplication in a file ``` cat311 celine434 celine434 celine5 jimmy12 john44 john41 ``` to be ``` celine5 jimmy12 john41 john44 cat311 celine434 ```

I want to sort following strings by the number and remove duplication in a file

cat311
celine434
celine434
celine5
jimmy12
john44
john41

to be

celine5
jimmy12
john41
john44
cat311
celine434

user8090410 (11 rep)

Sep 22, 2022, 06:28 PM • Last activity: Sep 22, 2022, 06:46 PM

2 votes

2 answers

1210 views

Find duplicate files based on first few characters of filename

linux bash files search duplicate

I am looking for a way in Linux shell, preferably bash to find duplicates of files based on first few letters of the filenames. **Where this would be useful:** I build mod packs for Minecraft. As of 1.14.4 Forge no longer errors if there are duplicate mods in a pack of higher versions. It simply sto...

minecolonies-0.13.312-beta-universal.jar   
minecolonies-0.13.386-alpha-universal.jar

by quickly being able to identify the dupes i can keep the client pack small. **More information as requested** There is no specific format. However as you can see there at least 2 prevailing formats. Further there is no standard in community about what kind of characters to use or not use. Some use spaces (ick), some use [] (also ick), some use _'s (more ick), some use -'s (preferred but what can you do). https://gist.github.com/be3cc9a77150194476b2000cb8ee16e5 for sample mods list of the filenames. Has been cleaned so no dupes in it. https://gist.github.com/b0ac1e03145e893e880da45cf08ebd7a contains a sample where I deliberately made duplicates. It is an over-exaggeration of happens from time to time. **Deeper Explanation** I realize this might be resource heavy to do. I would like to arbitrarily specify a slice range start to finish of all filenames to sample. Find duplicates based on that slice, and then hilight the duplicates. I don't need the script to actually delete them. **Extra Credit** The script would present a menu for files that it suspects match the duplication criterion allowing for easy deleting or renaming.

Kreezxil (75 rep)

Oct 29, 2020, 04:43 PM • Last activity: Aug 17, 2022, 09:45 AM

17 votes

12 answers

41079 views

Remove all duplicate word from string using shell script

shell-script shell text-processing xargs duplicate

I have a string like "aaa,aaa,aaa,bbb,bbb,ccc,bbb,ccc" I want to remove duplicate word from string then output will be like "aaa,bbb,ccc" I tried This code [Source][1] $ echo "zebra ant spider spider ant zebra ant" | xargs -n1 | sort -u | xargs It is working fine with same value,but when I give my v...

                                  I have a string like 

    "aaa,aaa,aaa,bbb,bbb,ccc,bbb,ccc"

I want to remove duplicate word from string then output will be like 

    
    "aaa,bbb,ccc"

I tried This code Source 

    $ echo "zebra ant spider spider ant zebra ant" | xargs -n1 | sort -u | xargs

It is working fine with same value,but when I give my variable value then it is showing all duplicate word also.

How can I  remove duplicate value.



**UPDATE**


My question is adding all corresponding value into a single string  if user is same .I have data like this ->

       user name 	| colour
		AAA			| red
		AAA			| black
		BBB			| red
		BBB			| blue
		AAA			| blue
		AAA			| red
		CCC			| red
		CCC			| red
		AAA			| green
		AAA			| red
		AAA			| black
		BBB			| red
		BBB			| blue
		AAA			| blue
		AAA			| red
		CCC			| red
		CCC			| red
		AAA			| green

In coding  I fetch all distinct user then I concatenate color string successfully .For that I am using code  -

    while read the records 

		if [ "$c" == "" ]; then  #$c I defined global
			c="$colour1"
		else
			c="$c,$colour1"	
		fi

When I print this $c variable i get the output (For User AAA) 

    "red,black,blue,red,green,red,black,blue,red,green,"

I want to remove duplicate color .Then desired output should be  like

    "red,black,blue,green"

For this desired output i used above code 

     echo "zebra ant spider spider ant zebra ant" | xargs -n1 | sort -u | xargs


but it is displaying the output with duplicate values .Like

"red,black,blue,red,green,red,black,blue,red,green,"
Thanks


                                

Urvashi (343 rep)

Mar 23, 2017, 12:41 PM • Last activity: Aug 3, 2022, 11:27 PM

10 votes

6 answers

7548 views

How to delete all duplicate hardlinks to a file?

files hard-link duplicate

I've got a directory tree created by `rsnapshot`, which contains multiple snapshots of the same directory structure with all identical files replaced by hardlinks. I would like to delete all those hardlink duplicates and keep only a single copy of every file (so I can later move all files into a sor...

                                  I've got a directory tree created by rsnapshot, which contains multiple snapshots of the same directory structure with all identical files replaced by hardlinks.

I would like to delete all those hardlink duplicates and keep only a single copy of every file (so I can later move all files into a sorted archive without having to touch identical files twice).

Is there a tool that does that?  
So far I've only found tools that find duplicates and *create* hardlinks to replace them…  
I guess I could list all files and their inode numbers and implement the deduplicating and deleting myself, but I don't want to reinvent the wheel here.
                                

n.st (8378 rep)

May 31, 2016, 02:21 PM • Last activity: May 19, 2022, 09:25 AM

5 votes

3 answers

6385 views

Find and list duplicate directories

scripting directory macos duplicate

I have directory that has a number of sub-directories and would like to find any duplicates. The folder structure looks something like this: └── Top_Dir └── Level_1_Dir ├── standard_cat │ ├── files.txt ├── standard_dog │ └── files.txt └── standard_snake └── files.txt └── Level_2_Dir ├── standard_moo...

                                  I have directory that has a number of sub-directories and would like to find any duplicates.  The folder structure looks something like this:

    └── Top_Dir
        └── Level_1_Dir
            ├── standard_cat
            │   ├── files.txt
            ├── standard_dog
            │   └── files.txt
            └── standard_snake
                └── files.txt
        └── Level_2_Dir
            ├── standard_moon
            │   ├── files.txt
            ├── standard_sun
            │   └── files.txt
            └── standard_cat
                └── files.txt
        └── Level_3_Dir
            ├── standard_man
            │   ├── files.txt
            ├── standard_woman
            │   └── files.txt
            └── standard_moon
                └── files.txt

With the above example I would like to see an output of:

    /top_dir/Level_1_Dir/standard_cat
    /top_dir/Level_2_Dir/standard_cat
    /top_dir/Level_2_Dir/standard_moon
    /top_dir/Level_3_Dir/standard_moon

I have been doing some searching on how to get this done via bash and I got nothing.  Anyone know a way to do this?

                                

dino (51 rep)

Jun 9, 2016, 03:21 AM • Last activity: Apr 21, 2022, 12:04 AM

0 votes

0 answers

64 views

Delete duplicated contents from files

backup merge duplicate

I have many backups of a same file. Is there a way to transform that into incremental backup? Those files aren't exactly the same (not same timestamps sometimes, sometimes new data appended here and there) I can't just search for dupes files, and I can't just delete old files for the new one, becaus...

                                  I have many backups of a same file. Is there a way to transform that into incremental backup?

Those files aren't exactly the same (not same timestamps sometimes, sometimes new data appended here and there)

I can't just search for dupes files, and I can't just delete old files for the new one, because sometimes the old one have data not here anymore

I want a way to delete duplicated content from files. So there will be unique data across all the files. Ideally that would be merging, because if I just delete bunch of datas, the file would be unopenable, because sometimes theres duplicated formatting datas

The problem is idk if new datas are purely by lines, or sometimes in the same line. It's not just a story about dupe lines, sometimes it's a part of the line who is duplicated

Do you have any ideas?

aac (145 rep)

Mar 3, 2022, 08:31 AM

-1 votes

1 answers

124 views

Skip line from console if equal than line before, adding count (in realtime)

bash uniq duplicate

Using uniq it is possible to filter out sequential duplicate lines. while (true) do echo 1; echo 2; echo 2; echo 1; sleep 1; done | uniq becomes: 1 2 1 Is there a way to have duplicated sequential lines removed, while adding the number of repetitions? E.g. in the example above 1 2 (2) 1 And if a new...

                                  Using uniq it is possible to filter out sequential duplicate lines.

    while (true) do echo 1; echo 2; echo 2; echo 1; sleep 1; done | uniq

becomes:

    1
    2
    1

Is there a way to have duplicated sequential lines removed, while adding the number of repetitions? E.g. in the example above

    1
    2 (2)
    1

And if a new "1" line arrives, the above should become:

    1
    2 (2)
    1 (2)

This is not for a file but for a stream (such as tail -f), where new lines are being added in real time.
                                

Jose Gómez (101 rep)

Jan 25, 2022, 01:49 PM • Last activity: Jan 25, 2022, 06:33 PM

-1 votes

1 answers

1912 views

remove duplicate lines across multiple txt files

awk sort duplicate

I have 12 text files all in one folder, each with about 5 million lines, each file has no duplicate line on its own but there are duplicated across multiple files, I want to remove the duplicate lines in each file but still save them separately, I have tried many Linux sort command and it keep mergi...

                                  I have 12 text files all in one folder, each with about 5 million lines, each file has no duplicate line on its own but there are duplicated across multiple files, I want to remove the duplicate lines in each file but still save them separately, I have tried many Linux sort command and it keep merging the file together, I have Windows, Linus, and Mac, Is there any code or application that can do this?
                                

Surprise Awofemi (41 rep)

Jan 14, 2022, 05:52 PM • Last activity: Jan 15, 2022, 06:24 AM

1 votes

2 answers

981 views

Remove duplicates of specific line keeping only the first appearance of each without touching other unspecified duplicates

linux awk sed string duplicate

I'm trying to edit a text file containing several duplicates. The goal is to keep only the first match of a string and remove the rest duplicate lines of the same string. In the example file ``` * Title 1 ** Subtitle 01 #+begin_src Line 001 Line 002 #+end_src * Title 1 ** Subtitle 02 #+begin_src Lin...

I'm trying to edit a text file containing several duplicates. The goal is to keep only the first match of a string and remove the rest duplicate lines of the same string. In the example file

* Title 1
** Subtitle 01
#+begin_src
  Line 001
  Line 002
#+end_src

* Title 1
** Subtitle 02
#+begin_src
  Line 001
  Line 002
#+end_src

* Title 2
** Subtitle 01
#+begin_src
  Line 001
  Line 002
#+end_src

* Title 2
** Subtitle 02
#+begin_src
  Line 001
  Line 002
#+end_src

I'd like to keep one of each

* Title N

and *keep all other unrelated/unspecified duplicate lines* on the file. So the result would be:

* Title 1
** Subtitle 01
#+begin_src
  Line 001
  Line 002
#+end_src

** Subtitle 02
#+begin_src
  Line 001
  Line 002
#+end_src

* Title 2
** Subtitle 01
#+begin_src
  Line 001
  Line 002
#+end_src

** Subtitle 02
#+begin_src
  Line 001
  Line 002
#+end_src

The traditional solutions for removing duplicates like

uniq file.txt

[Useful AWK One-Liners to Keep Handy](https://linoxide.com/useful-awk-one-liners-to-keep-handy/) :

awk '!a[$0]++' contents.txt

[shell - How to delete duplicate lines in a file without sorting it in Unix - Stack Overflow](https://stackoverflow.com/questions/1444406/how-to-delete-duplicate-lines-in-a-file-without-sorting-it-in-unix/32513573#32513573)

perl -ne 'print if ! $x{$_}++' file

delete every duplicate indiscriminately. I tried using variations of these solutions and also GNU

in a loop format like

duplicateLines=$(grep -E "^\* .*" file.org | uniq)
  printf '%s\n' "$duplicateLines" | while read -r line; do
  sed "s/$line//g2" file.org
done

with no success. I don't mind absolute performance so doing multiple iterations like calling

inside a loop to remove one specified string at a time is no problem. Any insight would be very much appreciated. It would be nice to be able to do this inside a shell script but I'm open to alternative solutions like Python, C, Java, etc., just tell me what the function/library name is and I'm searching for it there. Thanks.

yeyin33455 (13 rep)

Dec 30, 2021, 01:40 AM • Last activity: Jan 2, 2022, 12:44 AM

0 votes

0 answers

179 views

I have a `raspivid` stream, which I'm piping to `ffmpeg`, now i'd like to also stream a raw version of it to a socket?

socket split streaming duplicate

I have a process outputing an MJPEG video stream, which I pipe into `ffmpeg` to reduce framerate and then to a socket. ``` raspivid -t 999999 -cd MJPEG -w 1920 -h 1080 -o - | ffmpeg -i - -f mjpeg -r 2 - | nc -l 9010 ``` Now I need to also split the original raw stream into another socket. I've tried...

I have a process outputing an MJPEG video stream, which I pipe into ffmpeg to reduce framerate and then to a socket.

raspivid -t 999999 -cd MJPEG -w 1920 -h 1080 -o - | ffmpeg -i - -f mjpeg -r 2 - | nc -l 9010

Now I need to also split the original raw stream into another socket. I've tried tee command, including with named fifos, but I cant seem to make it work.

Ivan Koshelev (131 rep)

Feb 7, 2021, 12:06 AM

Showing page 1 of 20 total questions