Unix & Linux Stack Exchange
Q&A for users of Linux, FreeBSD and other Unix-like operating systems
Latest Questions
2
votes
1
answers
1989
views
ZSH - PATH Duplication : Directory added at end of PATH keeps duplicating when re-opening Terminal Session
I have recently installed PIPX on MAC running Big Sur and ZSH shell. During the install it prompted for the following to be added to the `.zshrc` file.... # Created by `pipx` on 2021-03-20 14:22:23 export PATH="$PATH:/Users/xxxx/.local/bin" eval "$(register-python-argcomplete pipx)" Running echo `$P...
I have recently installed PIPX on MAC running Big Sur and ZSH shell. During the install it prompted for the following to be added to the
.zshrc
file....
# Created by pipx
on 2021-03-20 14:22:23
export PATH="$PATH:/Users/xxxx/.local/bin"
eval "$(register-python-argcomplete pipx)"
Running echo $PATH
showed /Users/xxxx/.local/bin
added to the end of my PATH variable. However, when I close the terminal and open up a new session, running echo $PATH
now shows the location duplicated at the end of the PATH :/Users/xxxx/.local/bin:/Users/xxxx/.local/bin
Opening and closing new terminal sessions doesn't seem to create any more additions to PATH it just remains at these 2 entries....
I have run typeset -U PATH path
to remove the duplicate but each time I open up new terminal sessions it just duplicates again.
Does anybody know how I can stop this from happening.....I would really like to keep my PATH variable as clean as possible.
KB_cov
(29 rep)
Mar 21, 2021, 03:27 PM
• Last activity: Apr 25, 2025, 01:03 AM
0
votes
1
answers
400
views
iptables duplicate port traffic
I want to clone/duplicate all udp traffic incoming on port 8500 to port 8600. It is important that the source address is not modified. Also both ports must be accessible by applications (the packets must still arrive on the original port). This solution (https://unix.stackexchange.com/questions/7048...
I want to clone/duplicate all udp traffic incoming on port 8500 to port 8600. It is important that the source address is not modified. Also both ports must be accessible by applications (the packets must still arrive on the original port).
This solution (https://unix.stackexchange.com/questions/704887/nftables-duplicate-udp-packets-for-specific-destination-ipport-to-a-second-d) does work on a newer system, unfortunately the machine in question is running kernel 3.10 on RHEL 7 and I am not allowed to update it.
mirokai
(43 rep)
Apr 17, 2024, 10:34 AM
• Last activity: Apr 17, 2024, 04:34 PM
3
votes
2
answers
2147
views
Duplicate stdin to stdout and stderr, but in a synchronized way
I need to duplicate the stdout of a producer and feed it to two consumer in a **synchronized** fashion. consumer 1 producer | duplicator | consumer 2 This can easily be accomplished for example via `tee`: ```bash ((cat f.txt | tee /dev/stderr | ./cons1.py >&3) 2>&1 | ./cons2.py) 3>&1 ``` or via name...
I need to duplicate the stdout of a producer and feed it to two consumer in a **synchronized** fashion.
consumer 1
producer | duplicator |
consumer 2
This can easily be accomplished for example via
tee
:
((cat f.txt | tee /dev/stderr | ./cons1.py >&3) 2>&1 | ./cons2.py) 3>&1
or via named pipes:
mkfifo fifo1 fifo2
cat f.txt | tee fifo1 fifo2 >/dev/null &
int main()
{
char *line = NULL;
size_t size;
while (getline(&line, &size, stdin) != -1) {
fprintf(stdout, "%s", line);
fprintf(stderr, "%s", line);
}
return 0;
}
and then:
((cat f.txt | ./dup | ./cons1.py >&3) 2>&1 | ./cons2.py) 3>&1
However, **if consumer 1 is faster than consumer 2 we have a problem**. E.g., consumer 1 is already at line 50,000 while consumer 2 is at line 17,000.
For my system **I need that both consumers are at the same line, hence the faster consumer needs to be restricted**. I know that this might be impossible via Linux standard tools. However, at least if we use that dup.c
approach, it should be somehow possible. Any suggestions how to accomplish this? Thanks!
m33x
(33 rep)
Sep 6, 2015, 12:06 PM
• Last activity: Dec 14, 2023, 06:31 PM
0
votes
0
answers
25
views
Linux home dir has a "duplicate" under /home/music but when anything is deleted from one it disappears from both -- how can I delete 2nd home dir?
I don't know where the 2nd "phantom" home dir came from. Properties folder shows it's 4K. The "real" home directory is 128,000+ files and 208GB. User10489 had the answer: a symlink directory. In fact, using the ls -l he suggested, I found a second directory that was also a symlink. I used rm to remo...
I don't know where the 2nd "phantom" home dir came from. Properties folder shows it's 4K. The "real" home directory is 128,000+ files and 208GB.
User10489 had the answer: a symlink directory. In fact, using the ls -l he suggested, I found a second directory that was also a symlink. I used rm to remove both. Thank you.
R E Brinson
(1 rep)
Dec 2, 2023, 03:06 AM
• Last activity: Dec 2, 2023, 09:59 PM
8
votes
7
answers
19353
views
How to find and delete duplicate files within the same directory?
I want to find duplicate files, within a directory, and then delete all but one, to reclaim space. How do I achieve this using a shell script? For example: pwd folder Files in it are: log.bkp log extract.bkp extract I need to compare log.bkp with all the other files and if a duplicate file is found...
I want to find duplicate files, within a directory, and then delete all but one, to reclaim space. How do I achieve this using a shell script?
For example:
pwd
folder
Files in it are:
log.bkp
log
extract.bkp
extract
I need to compare log.bkp with all the other files and if a duplicate file is found (by it's content), I need to delete it. Similarly, file 'log' has to be checked with all other files, that follow, and so on.
So far, I have written this, But it's not giving desired result.
#!/usr/bin/env ksh
count=
ls -ltrh /folder | grep '^-'|wc -l
for i in /folder/*
do
for (( j=i+1; j<=count; j++ ))
do
echo "Current two files are $i and $j"
sdiff -s $i $j
if [ echo $?
-eq 0 ]
then
echo "Contents of $i and $j are same"
fi
done
done
Su_scriptingbee
(319 rep)
May 28, 2017, 02:41 PM
• Last activity: Oct 24, 2023, 04:15 PM
0
votes
3
answers
34
views
Filter duplicated based on values of another column
I have the following example of a dataframe.Where you see that elements of 3nd column could be duplicated.I want to keep the entry which has the highest value in column 5 Meaning that for **AGCCCGGGG** I want to keep the second entry which the 5th column has the value of 49. A00643:620:HFM7YDSX5:1:1...
I have the following example of a dataframe.Where you see that elements of 3nd column could be duplicated.I want to keep the entry which has the highest value in column 5
Meaning that for **AGCCCGGGG** I want to keep the second entry which the 5th column has the value of 49.
A00643:620:HFM7YDSX5:1:1124:7120:12352 ATCAGCCCGGGGCTTGGGCTAGGAC GGGTGTGTG 548476 0 Corynebacterium
A00643:620:HFM7YDSX5:1:1150:15953:12524 CCTATCGTCGCTGGAATTCCCCGGG AGCCCGGGG 1458266 1 Bordetella
A00643:620:HFM7YDSX5:1:1150:15628:12743 CCTATCGTCGCTGGAATTCCCCGGG AGCCCGGGG 1458266 49 Bordetella
A00643:620:HFM7YDSX5:1:1450:4001:4507 GGCGATCGAAATGTCAAGCCCGGGG TCTTGTGGT 585529 0 Corynebacterium
A00643:620:HFM7YDSX5:1:2124:8865:2472 ATCAGCCCGGGGCTTGGGCTAGGAC GGGTGTGTG 548476 0 Corynebacterium
A00643:620:HFM7YDSX5:1:2476:4001:29496 ATTCACCCTATAGGAGCCCGGGGCA TGCCCCGGG 1458266 0 Bordetella
Anna Antonatou -Pappaioannou
(1 rep)
May 15, 2023, 10:32 AM
• Last activity: Oct 20, 2023, 09:45 AM
0
votes
3
answers
106
views
Need assistance with awk/sed to identify/mark duplicate IP addresses
Good day. I have a text file which contains pod/node names and associated IPv6 addresses of which two pods have the same IP address, first pod **k8-worker0001c-cif-9d86d6dd4-vf9b9** and last pod **k8-worker0001c-ctdc-5bc95b699f-xnmrn**, the IP address being **2001:1890:e00f:3900::6** ```$ cat /tmp/d...
Good day.
I have a text file which contains pod/node names and associated IPv6 addresses of which two pods have the same IP address, first pod **k8-worker0001c-cif-9d86d6dd4-vf9b9** and last pod **k8-worker0001c-ctdc-5bc95b699f-xnmrn**, the IP address being **2001:1890:e00f:3900::6**
$ cat /tmp/dup_ip.txt
k8-worker0001c-cif-9d86d6dd4-vf9b9
2001:1890:e00f:3900::4/64 global nodad
2001:1890:e00f:3900::6/64 global
k8-worker0001c-cifpartner-64c89f8bc8-8p5pq
2001:1890:e00f:3900::10/64 global
k8-worker0001c-ctd-7d759784ff-2gk5d
2001:1890:e00f:3900::a/64 global nodad
2001:1890:e00f:3900::d/64 global
k8-worker0001c-ctd-7d759784ff-hd8jp
2001:1890:e00f:3900::c/64 global
k8-worker0001c-ctd-7d759784ff-qkk4t
2001:1890:e00f:3900::8/64 global nodad
2001:1890:e00f:3900::f/64 global
k8-worker0001c-ctd-7d759784ff-t6lwz
2001:1890:e00f:3900::5/64 global
k8-worker0001c-ctd-7d759784ff-vl8x9
2001:1890:e00f:3900::9/64 global nodad
2001:1890:e00f:3900::b/64 global
k8-worker0001c-ctdc-5bc95b699f-xnmrn
2001:1890:e00f:3900::7/64 global nodad
2001:1890:e00f:3900::6/64 global
All I need is a one-liner to identify the duplicate IP address while retaining the rest, including the pod names. I have tried using **awk** **!seen** but that deletes the duplicate which I don't want.
Therefore I'd like something like this:
$ cat /tmp/dup_ip.txt
k8-worker0001c-cif-9d86d6dd4-vf9b9
2001:1890:e00f:3900::4/64 global nodad
2001:1890:e00f:3900::6/64 global DUPLICATE!
k8-worker0001c-cifpartner-64c89f8bc8-8p5pq
2001:1890:e00f:3900::10/64 global
k8-worker0001c-ctd-7d759784ff-2gk5d
2001:1890:e00f:3900::a/64 global nodad
2001:1890:e00f:3900::d/64 global
k8-worker0001c-ctd-7d759784ff-hd8jp
2001:1890:e00f:3900::c/64 global
k8-worker0001c-ctd-7d759784ff-qkk4t
2001:1890:e00f:3900::8/64 global nodad
2001:1890:e00f:3900::f/64 global
k8-worker0001c-ctd-7d759784ff-t6lwz
2001:1890:e00f:3900::5/64 global
k8-worker0001c-ctd-7d759784ff-vl8x9
2001:1890:e00f:3900::9/64 global nodad
2001:1890:e00f:3900::b/64 global
k8-worker0001c-ctdc-5bc95b699f-xnmrn
2001:1890:e00f:3900::7/64 global nodad
2001:1890:e00f:3900::6/64 global DUPLICATE!
Thanks in advance,
Bjoern
Bjoern
(59 rep)
May 8, 2023, 07:16 PM
• Last activity: May 9, 2023, 07:54 PM
0
votes
2
answers
146
views
awk is automatically duplicating some lines. Can someone explain?
My data looks like: A 4 G 1 G 1 C 4 C 2 C 2 T 6 T 5 T 5 A 6 T 2 T 2 C 6 T 2 T 2 T 6 G 2 G 2 I am trying the command: awk -F " " '$1==$3 {$7=$6; print $0;} $1==$5 {$7=$4; print $0;} ($1 != $3 && $1 != $5) {$7=$2; print $0}' test.txt While the data has only 5 lines the output has 7 lines and certain l...
My data looks like:
A 4 G 1 G 1
C 4 C 2 C 2
T 6 T 5 T 5
A 6 T 2 T 2
C 6 T 2 T 2
T 6 G 2 G 2
I am trying the command:
awk -F " " '$1==$3 {$7=$6; print $0;}
$1==$5 {$7=$4; print $0;}
($1 != $3 && $1 != $5) {$7=$2; print $0}' test.txt
While the data has only 5 lines the output has 7 lines and certain lines are randomly duplicated.
Somehow it happens with only this dataset and not the other datasets that I have.
Can someone please help. I don't understand what is happening
user563991
(9 rep)
Mar 3, 2023, 03:28 PM
• Last activity: Mar 3, 2023, 03:50 PM
1
votes
1
answers
1171
views
Find duplicate IPs for different MACs
Using arp-scan to get a list of returned duplicate IP address. However, arp-scan will list duplicate IP with the same MAC address. I get a sorted output of asx.txt (shortened for brevity) ~~~ arp-scan 172.16.0.0/16 > as.txt sort as.txt > as2.txt cat as2.txt | uniq -D -w 36 > asx.txt kye-mgmt02:/data...
Using arp-scan to get a list of returned duplicate IP address. However, arp-scan will list duplicate IP with the same MAC address. I get a sorted output of asx.txt (shortened for brevity)
~~~
arp-scan 172.16.0.0/16 > as.txt
sort as.txt > as2.txt
cat as2.txt | uniq -D -w 36 > asx.txt
kye-mgmt02:/data # cat asx.txt
172.16.150.68 d8:cb:8a:b0:6a:12 Micro-Star INTL CO., LTD.
172.16.150.68 d8:cb:8a:b0:6a:12 Micro-Star INTL CO., LTD. (DUP: 2)
172.16.150.69 00:23:24:9e:3d:32 G-PRO COMPUTER
172.16.150.69 00:23:24:9e:3d:32 G-PRO COMPUTER (DUP: 2)
172.16.150.70 00:23:24:9e:3d:82 G-PRO COMPUTER
172.16.150.70 00:23:24:9e:3d:82 G-PRO COMPUTER (DUP: 2)
172.16.150.71 d8:cb:8a:86:2f:56 Micro-Star INTL CO., LTD.
172.16.150.71 d8:cb:8a:86:2f:56 Micro-Star INTL CO., LTD. (DUP: 2)
172.16.150.72 d8:cb:8a:cf:f1:e8 Micro-Star INTL CO., LTD.
172.16.150.72 d8:cb:8a:cf:f1:e8 Micro-Star INTL CO., LTD. (DUP: 2)
172.16.150.73 d8:cb:8a:cf:f1:5d Micro-Star INTL CO., LTD.
172.16.150.73 d8:cb:8a:cf:f1:5d Micro-Star INTL CO., LTD. (DUP: 2)
~~~
So as you can see, all the IPs are really not duplicated because the IPs have the same MAC address
to really find a duplicate IP with a different MAC, I edited the file and change the MAC of the last IP.
~~~
kye-mgmt02:/data # cat asx.txt
172.16.150.68 d8:cb:8a:b0:6a:12 Micro-Star INTL CO., LTD.
172.16.150.68 d8:cb:8a:b0:6a:12 Micro-Star INTL CO., LTD. (DUP: 2)
172.16.150.69 00:23:24:9e:3d:32 G-PRO COMPUTER
172.16.150.69 00:23:24:9e:3d:32 G-PRO COMPUTER (DUP: 2)
172.16.150.70 00:23:24:9e:3d:82 G-PRO COMPUTER
172.16.150.70 00:23:24:9e:3d:82 G-PRO COMPUTER (DUP: 2)
172.16.150.71 d8:cb:8a:86:2f:56 Micro-Star INTL CO., LTD.
172.16.150.71 d8:cb:8a:86:2f:56 Micro-Star INTL CO., LTD. (DUP: 2)
172.16.150.72 d8:cb:8a:cf:f1:e8 Micro-Star INTL CO., LTD.
172.16.150.72 d8:cb:8a:cf:f1:e8 Micro-Star INTL CO., LTD. (DUP: 2)
172.16.150.73 d8:cb:8a:cf:f1:5d Micro-Star INTL CO., LTD.
172.16.150.73 d8:cb:8a:cf:f1:55 Micro-Star INTL CO., LTD. (DUP: 2)
~~~
Looking on how to output the duplicate IPs with different MACs
expected output
~~~
172.16.150.73 d8:cb:8a:cf:f1:5d Micro-Star INTL CO., LTD.
172.16.150.73 d8:cb:8a:cf:f1:55 Micro-Star INTL CO., LTD. (DUP: 2)
~~~
I can't seem to find the right options to output the duplicate IPs with different MACs
Help please.
---
**tried
cat asx.txt | uniq -D -s 15 -w 33
cat asx.txt | uniq -D -s 15 -w 17-33
cat asx.txt | uniq -D -f1 -w 33
cat asx.txt | uniq -D -f1 -w 32
cat asx.txt | uniq -D -f1 -w 31
cat asx.txt | uniq -D -f1 -w 30
cat asx.txt | uniq -D -f1
cat asx.txt | uniq -D -s 15
But none gives the desired output.
user552826
Dec 12, 2022, 03:58 PM
• Last activity: Dec 12, 2022, 06:18 PM
0
votes
0
answers
189
views
Removing duplicate values based on two columns
I have a file that would like to filter duplicate values based column 1 and 6 ID,sample,NAME,reference,app_name,appession_id,workflow,execution_status,status,date_created 1,ABC,XYZ,DOP,2022-08-18 13:31:09Z,28997974,same,Complete,PASS,18/08/2022 1,ABC,XYZ,DOP,2022-08-18 13:31:09Z,28997974,same,Comple...
I have a file that would like to filter duplicate values based column 1 and 6
ID,sample,NAME,reference,app_name,appession_id,workflow,execution_status,status,date_created
1,ABC,XYZ,DOP,2022-08-18 13:31:09Z,28997974,same,Complete,PASS,18/08/2022
1,ABC,XYZ,DOP,2022-08-18 13:31:09Z,28997974,same,Complete,PASS,18/08/2022
1,ABC,XYZ,DOP,2022-08-18 13:31:09Z,28997974,same,Complete,PASS,18/08/2022
1,ABC,XYZ,DOP,2022-08-18 13:31:09Z,28997974,same,Complete,PASS,18/08/2022
2,ABC,XYZ,DOP,2022-08-18 13:31:09Z,28997974,same,Complete,PASS,18/08/2022
2,ABC,XYZ,DOP,2022-08-18 13:31:09Z,28997974,same,Complete,PASS,18/08/2022
2,ABC,XYZ,DOP,2022-08-18 13:31:09Z,28997974,same,Complete,PASS,18/08/2022
2,ABC,XYZ,DOP,2022-08-18 13:31:09Z,28997974,same,Complete,PASS,18/08/2022
and the final output should look like
ID,sample,NAME,reference,app_name,appession_id,workflow,execution_status,status,date_created
1,ABC,XYZ,DOP,2022-08-18 13:31:09Z,28997974,same,Complete,PASS,18/08/2022
2,ABC,XYZ,DOP,2022-08-18 13:31:09Z,28997974,same,Complete,PASS,18/08/2022
So far this is what I have tried
awk '!a[$1 $6]++ { print ;}' input.csv > output.csv
I end up with
ID,sample,NAME,reference,app_name,appession_id,workflow,execution_status,status,date_created
1,ABC,XYZ,DOP,2022-08-18 13:31:09Z,28997974,same,Complete,PASS,18/08/2022
2,ABC,XYZ,DOP,2022-08-18 13:31:09Z,28997974,same,Complete,PASS,18/08/2022
2,ABC,XYZ,DOP,2022-08-18 13:31:09Z,28997974,same,Complete,PASS,18/08/2022
Any suggestion would be helpful. Thank you
nbn
(113 rep)
Oct 14, 2022, 03:59 PM
• Last activity: Oct 17, 2022, 07:58 AM
1
votes
1
answers
347
views
How to sort a list of strings that contain a combination of letters and numbers
I want to sort following strings by the number and remove duplication in a file ``` cat311 celine434 celine434 celine5 jimmy12 john44 john41 ``` to be ``` celine5 jimmy12 john41 john44 cat311 celine434 ```
I want to sort following strings by the number and remove duplication in a file
cat311
celine434
celine434
celine5
jimmy12
john44
john41
to be
celine5
jimmy12
john41
john44
cat311
celine434
user8090410
(11 rep)
Sep 22, 2022, 06:28 PM
• Last activity: Sep 22, 2022, 06:46 PM
2
votes
2
answers
1210
views
Find duplicate files based on first few characters of filename
I am looking for a way in Linux shell, preferably bash to find duplicates of files based on first few letters of the filenames. **Where this would be useful:** I build mod packs for Minecraft. As of 1.14.4 Forge no longer errors if there are duplicate mods in a pack of higher versions. It simply sto...
I am looking for a way in Linux shell, preferably bash to find duplicates of files based on first few letters of the filenames.
**Where this would be useful:**
I build mod packs for Minecraft. As of 1.14.4 Forge no longer errors if there are duplicate mods in a pack of higher versions. It simply stops the oldest versions from running. A script to help find these duplicates would be very advantageous.
Example listing:
minecolonies-0.13.312-beta-universal.jar
minecolonies-0.13.386-alpha-universal.jar
by quickly being able to identify the dupes i can keep the client pack small.
**More information as requested**
There is no specific format. However as you can see there at least 2 prevailing formats. Further there is no standard in community about what kind of characters to use or not use. Some use spaces (ick), some use [] (also ick), some use _'s (more ick), some use -'s (preferred but what can you do).
https://gist.github.com/be3cc9a77150194476b2000cb8ee16e5 for sample mods list of the filenames. Has been cleaned so no dupes in it.
https://gist.github.com/b0ac1e03145e893e880da45cf08ebd7a contains a sample where I deliberately made duplicates. It is an over-exaggeration of happens from time to time.
**Deeper Explanation**
I realize this might be resource heavy to do.
I would like to arbitrarily specify a slice range start to finish of all filenames to sample. Find duplicates based on that slice, and then hilight the duplicates. I don't need the script to actually delete them.
**Extra Credit**
The script would present a menu for files that it suspects match the duplication criterion allowing for easy deleting or renaming.
Kreezxil
(75 rep)
Oct 29, 2020, 04:43 PM
• Last activity: Aug 17, 2022, 09:45 AM
17
votes
12
answers
41079
views
Remove all duplicate word from string using shell script
I have a string like "aaa,aaa,aaa,bbb,bbb,ccc,bbb,ccc" I want to remove duplicate word from string then output will be like "aaa,bbb,ccc" I tried This code [Source][1] $ echo "zebra ant spider spider ant zebra ant" | xargs -n1 | sort -u | xargs It is working fine with same value,but when I give my v...
I have a string like
"aaa,aaa,aaa,bbb,bbb,ccc,bbb,ccc"
I want to remove duplicate word from string then output will be like
"aaa,bbb,ccc"
I tried This code Source
$ echo "zebra ant spider spider ant zebra ant" | xargs -n1 | sort -u | xargs
It is working fine with same value,but when I give my variable value then it is showing all duplicate word also.
How can I remove duplicate value.
**UPDATE**
My question is adding all corresponding value into a single string if user is same .I have data like this ->
user name | colour
AAA | red
AAA | black
BBB | red
BBB | blue
AAA | blue
AAA | red
CCC | red
CCC | red
AAA | green
AAA | red
AAA | black
BBB | red
BBB | blue
AAA | blue
AAA | red
CCC | red
CCC | red
AAA | green
In coding I fetch all distinct user then I concatenate color string successfully .For that I am using code -
while read the records
if [ "$c" == "" ]; then #$c I defined global
c="$colour1"
else
c="$c,$colour1"
fi
When I print this $c variable i get the output (For User AAA)
"red,black,blue,red,green,red,black,blue,red,green,"
I want to remove duplicate color .Then desired output should be like
"red,black,blue,green"
For this desired output i used above code
echo "zebra ant spider spider ant zebra ant" | xargs -n1 | sort -u | xargs
but it is displaying the output with duplicate values .Like
"red,black,blue,red,green,red,black,blue,red,green,"
Thanks
Urvashi
(343 rep)
Mar 23, 2017, 12:41 PM
• Last activity: Aug 3, 2022, 11:27 PM
10
votes
6
answers
7548
views
How to delete all duplicate hardlinks to a file?
I've got a directory tree created by `rsnapshot`, which contains multiple snapshots of the same directory structure with all identical files replaced by hardlinks. I would like to delete all those hardlink duplicates and keep only a single copy of every file (so I can later move all files into a sor...
I've got a directory tree created by
rsnapshot
, which contains multiple snapshots of the same directory structure with all identical files replaced by hardlinks.
I would like to delete all those hardlink duplicates and keep only a single copy of every file (so I can later move all files into a sorted archive without having to touch identical files twice).
Is there a tool that does that?
So far I've only found tools that find duplicates and *create* hardlinks to replace them…
I guess I could list all files and their inode numbers and implement the deduplicating and deleting myself, but I don't want to reinvent the wheel here.
n.st
(8378 rep)
May 31, 2016, 02:21 PM
• Last activity: May 19, 2022, 09:25 AM
5
votes
3
answers
6385
views
Find and list duplicate directories
I have directory that has a number of sub-directories and would like to find any duplicates. The folder structure looks something like this: └── Top_Dir └── Level_1_Dir ├── standard_cat │ ├── files.txt ├── standard_dog │ └── files.txt └── standard_snake └── files.txt └── Level_2_Dir ├── standard_moo...
I have directory that has a number of sub-directories and would like to find any duplicates. The folder structure looks something like this:
└── Top_Dir
└── Level_1_Dir
├── standard_cat
│ ├── files.txt
├── standard_dog
│ └── files.txt
└── standard_snake
└── files.txt
└── Level_2_Dir
├── standard_moon
│ ├── files.txt
├── standard_sun
│ └── files.txt
└── standard_cat
└── files.txt
└── Level_3_Dir
├── standard_man
│ ├── files.txt
├── standard_woman
│ └── files.txt
└── standard_moon
└── files.txt
With the above example I would like to see an output of:
/top_dir/Level_1_Dir/standard_cat
/top_dir/Level_2_Dir/standard_cat
/top_dir/Level_2_Dir/standard_moon
/top_dir/Level_3_Dir/standard_moon
I have been doing some searching on how to get this done via bash and I got nothing. Anyone know a way to do this?
dino
(51 rep)
Jun 9, 2016, 03:21 AM
• Last activity: Apr 21, 2022, 12:04 AM
0
votes
0
answers
64
views
Delete duplicated contents from files
I have many backups of a same file. Is there a way to transform that into incremental backup? Those files aren't exactly the same (not same timestamps sometimes, sometimes new data appended here and there) I can't just search for dupes files, and I can't just delete old files for the new one, becaus...
I have many backups of a same file. Is there a way to transform that into incremental backup?
Those files aren't exactly the same (not same timestamps sometimes, sometimes new data appended here and there)
I can't just search for dupes files, and I can't just delete old files for the new one, because sometimes the old one have data not here anymore
I want a way to delete duplicated content from files. So there will be unique data across all the files. Ideally that would be merging, because if I just delete bunch of datas, the file would be unopenable, because sometimes theres duplicated formatting datas
The problem is idk if new datas are purely by lines, or sometimes in the same line. It's not just a story about dupe lines, sometimes it's a part of the line who is duplicated
Do you have any ideas?
aac
(145 rep)
Mar 3, 2022, 08:31 AM
-1
votes
1
answers
124
views
Skip line from console if equal than line before, adding count (in realtime)
Using uniq it is possible to filter out sequential duplicate lines. while (true) do echo 1; echo 2; echo 2; echo 1; sleep 1; done | uniq becomes: 1 2 1 Is there a way to have duplicated sequential lines removed, while adding the number of repetitions? E.g. in the example above 1 2 (2) 1 And if a new...
Using uniq it is possible to filter out sequential duplicate lines.
while (true) do echo 1; echo 2; echo 2; echo 1; sleep 1; done | uniq
becomes:
1
2
1
Is there a way to have duplicated sequential lines removed, while adding the number of repetitions? E.g. in the example above
1
2 (2)
1
And if a new "1" line arrives, the above should become:
1
2 (2)
1 (2)
This is not for a file but for a stream (such as tail -f), where new lines are being added in real time.
Jose Gómez
(101 rep)
Jan 25, 2022, 01:49 PM
• Last activity: Jan 25, 2022, 06:33 PM
-1
votes
1
answers
1912
views
remove duplicate lines across multiple txt files
I have 12 text files all in one folder, each with about 5 million lines, each file has no duplicate line on its own but there are duplicated across multiple files, I want to remove the duplicate lines in each file but still save them separately, I have tried many Linux sort command and it keep mergi...
I have 12 text files all in one folder, each with about 5 million lines, each file has no duplicate line on its own but there are duplicated across multiple files, I want to remove the duplicate lines in each file but still save them separately, I have tried many Linux sort command and it keep merging the file together, I have Windows, Linus, and Mac, Is there any code or application that can do this?
Surprise Awofemi
(41 rep)
Jan 14, 2022, 05:52 PM
• Last activity: Jan 15, 2022, 06:24 AM
1
votes
2
answers
981
views
Remove duplicates of specific line keeping only the first appearance of each without touching other unspecified duplicates
I'm trying to edit a text file containing several duplicates. The goal is to keep only the first match of a string and remove the rest duplicate lines of the same string. In the example file ``` * Title 1 ** Subtitle 01 #+begin_src Line 001 Line 002 #+end_src * Title 1 ** Subtitle 02 #+begin_src Lin...
I'm trying to edit a text file containing several duplicates. The goal is to keep only the first match of a string and remove the rest duplicate lines of the same string.
In the example file
* Title 1
** Subtitle 01
#+begin_src
Line 001
Line 002
#+end_src
* Title 1
** Subtitle 02
#+begin_src
Line 001
Line 002
#+end_src
* Title 2
** Subtitle 01
#+begin_src
Line 001
Line 002
#+end_src
* Title 2
** Subtitle 02
#+begin_src
Line 001
Line 002
#+end_src
I'd like to keep one of each * Title N
and *keep all other unrelated/unspecified duplicate lines* on the file.
So the result would be:
* Title 1
** Subtitle 01
#+begin_src
Line 001
Line 002
#+end_src
** Subtitle 02
#+begin_src
Line 001
Line 002
#+end_src
* Title 2
** Subtitle 01
#+begin_src
Line 001
Line 002
#+end_src
** Subtitle 02
#+begin_src
Line 001
Line 002
#+end_src
The traditional solutions for removing duplicates like
uniq file.txt
[Useful AWK One-Liners to Keep Handy](https://linoxide.com/useful-awk-one-liners-to-keep-handy/) :
awk '!a[$0]++' contents.txt
[shell - How to delete duplicate lines in a file without sorting it in Unix - Stack Overflow](https://stackoverflow.com/questions/1444406/how-to-delete-duplicate-lines-in-a-file-without-sorting-it-in-unix/32513573#32513573)
perl -ne 'print if ! $x{$_}++' file
delete every duplicate indiscriminately.
I tried using variations of these solutions and also GNU
in a loop format like
duplicateLines=$(grep -E "^\* .*" file.org | uniq)
printf '%s\n' "$duplicateLines" | while read -r line; do
sed "s/$line//g2" file.org
done
with no success. I don't mind absolute performance so doing multiple iterations like calling
inside a loop to remove one specified string at a time is no problem.
Any insight would be very much appreciated.
It would be nice to be able to do this inside a shell script but I'm open to alternative solutions like Python, C, Java, etc., just tell me what the function/library name is and I'm searching for it there.
Thanks.
yeyin33455
(13 rep)
Dec 30, 2021, 01:40 AM
• Last activity: Jan 2, 2022, 12:44 AM
0
votes
0
answers
179
views
I have a `raspivid` stream, which I'm piping to `ffmpeg`, now i'd like to also stream a raw version of it to a socket?
I have a process outputing an MJPEG video stream, which I pipe into `ffmpeg` to reduce framerate and then to a socket. ``` raspivid -t 999999 -cd MJPEG -w 1920 -h 1080 -o - | ffmpeg -i - -f mjpeg -r 2 - | nc -l 9010 ``` Now I need to also split the original raw stream into another socket. I've tried...
I have a process outputing an MJPEG video stream, which I pipe into
ffmpeg
to reduce framerate and then to a socket.
raspivid -t 999999 -cd MJPEG -w 1920 -h 1080 -o - | ffmpeg -i - -f mjpeg -r 2 - | nc -l 9010
Now I need to also split the original raw stream into another socket. I've tried tee
command, including with named fifos, but I cant seem to make it work.
Ivan Koshelev
(131 rep)
Feb 7, 2021, 12:06 AM
Showing page 1 of 20 total questions