Unix & Linux Stack Exchange
Q&A for users of Linux, FreeBSD and other Unix-like operating systems
Latest Questions
-1
votes
1
answers
1066
views
How to get unique occurrence of words from a very large file?
I have been asked to write a word frequency analysis program using the unix/ shell-scripts with the following requirements: - Input is a text file with one word per line - Input words are drawn from the Compact Oxford English Dictionary New Edition - Character encoding is UTF-8 - Input file is 1 Peb...
I have been asked to write a word frequency analysis program using the
unix/ shell-scripts with the following requirements:
- Input is a text file with one word per line
- Input words are drawn from the Compact Oxford English Dictionary New Edition
- Character encoding is UTF-8
- Input file is 1 Pebibyte (PiB) in length
- Output is of the format “ Word occurred N times”
I am aware of one of the way to begin with as below ---
cat filename | xargs -n1 | sort | uniq -c > newfilename
What should be the best optimal way to do that considering performance as well?
Pratik Barjatiya
(23 rep)
Dec 29, 2017, 06:27 AM
• Last activity: Apr 15, 2025, 03:40 PM
2
votes
4
answers
9322
views
finding all non-unique lines in a file
I'm trying to use uniq to find all non-unique lines in a file. By non-unique, I mean any line that I have already seen on the previous line. I thought that the "-D" option would do this: -D print all duplicate lines But instead of just printing the duplicate lines, it prints *all* the lines when the...
I'm trying to use uniq to find all non-unique lines in a file. By non-unique, I mean any line that I have already seen on the previous line. I thought that the "-D" option would do this:
-D print all duplicate lines
But instead of just printing the duplicate lines, it prints *all* the lines when there is more than one. I want to only print the second and subsequent copies of a line.
How can I do this?
Michael
(544 rep)
Nov 7, 2019, 10:18 PM
• Last activity: Apr 8, 2025, 06:45 PM
0
votes
2
answers
57
views
pipe to uniq from a variable not showing the desired output
I have a pipeline using array jobs and need to change the number of inputs for some steps. I thought about testing `uniq` since the only part changing in my folders are the last four characters (the *hap* part in the example). So, all my paths look something like: ``` /mnt/nvme/user/something1/hap1...
I have a pipeline using array jobs and need to change the number of inputs for some steps. I thought about testing
uniq
since the only part changing in my folders are the last four characters (the *hap* part in the example). So, all my paths look something like:
/mnt/nvme/user/something1/hap1
/mnt/nvme/user/something1/hap2
/mnt/nvme/user/something2/hap1
/mnt/nvme/user/something2/hap2
and what I'm doing is the following:
DIR=( "/mnt/nvme/ungaro/something1/hap1" "/mnt/nvme/ungaro/something1/hap2" "/mnt/nvme/ungaro/something2/hap1" "/mnt/nvme/ungaro/something2/hap2" )
for dir in "${DIR[@]}"; do echo $dir | sed 's#/hap[0-9]##' | uniq; done
But the resulting output always displays all the elements in the variable without collapsing the duplicate rows after removing the *hap* part of each one of them.
Probably I'm missing something, could it be that the for
forces to print all lines anyway. If so, is there a way to attained the desired result in a single line command?
Matteo
(209 rep)
Feb 10, 2025, 04:25 PM
• Last activity: Feb 10, 2025, 05:16 PM
121
votes
15
answers
57558
views
How can I remove duplicates in my .bash_history, preserving order?
I really enjoying using `control+r` to recursively search my command history. I've found a few good options I like to use with it: # ignore duplicate commands, ignore commands starting with a space export HISTCONTROL=erasedups:ignorespace # keep the last 5000 entries export HISTSIZE=5000 # append to...
I really enjoying using
control+r
to recursively search my command history. I've found a few good options I like to use with it:
# ignore duplicate commands, ignore commands starting with a space
export HISTCONTROL=erasedups:ignorespace
# keep the last 5000 entries
export HISTSIZE=5000
# append to the history instead of overwriting (good for multiple connections)
shopt -s histappend
The only problem for me is that erasedups
only erases sequential duplicates - so that with this string of commands:
ls
cd ~
ls
The ls
command will actually be recorded twice. I've thought about periodically running w/ cron:
cat .bash_history | sort | uniq > temp.txt
mv temp.txt .bash_history
This would achieve removing the duplicates, but unfortunately the order would not be preserved. If I don't sort
the file first I don't believe uniq
can work properly.
How can I remove duplicates in my .bash_history, preserving order?
### Extra Credit:
Are there any problems with overwriting the .bash_history
file via a script? For example, if you remove an apache log file I think you need to send a nohup / reset signal with kill
to have it flush it's connection to the file. If that is the case with the .bash_history
file, perhaps I could somehow use ps
to check and make sure there are no connected sessions before the filtering script is run?
cwd
(46887 rep)
Sep 20, 2012, 02:55 PM
• Last activity: Feb 3, 2025, 01:47 PM
1
votes
2
answers
825
views
How to sort or uniq a live feed
I'm looking to sort and isolate IP from a `tcpdump` live feed. tcpdump -n -i tun0 "tcp[tcpflags] & (tcp-syn) != 0" | grep -E -o "([0-9]{1,3}[\.]){3}[0-9]{1,3} works just fine but when I try to add the `uniq`program it fails: tcpdump -n -i tun0 "tcp[tcpflags] & (tcp-syn) != 0" | grep -E -o "([0-9]{1,...
I'm looking to sort and isolate IP from a
tcpdump
live feed.
tcpdump -n -i tun0 "tcp[tcpflags] & (tcp-syn) != 0" | grep -E -o "([0-9]{1,3}[\.]){3}[0-9]{1,3}
works just fine but when I try to add the uniq
program it fails:
tcpdump -n -i tun0 "tcp[tcpflags] & (tcp-syn) != 0" | grep -E -o "([0-9]{1,3}[\.]){3}[0-9]{1,3}" | uniq -u
returns nothing.
Same with sort -u
.
Any idea on how to fix this ?
ChiseledAbs
(2301 rep)
Jul 8, 2016, 10:29 AM
• Last activity: Jul 26, 2024, 05:54 AM
1
votes
1
answers
98
views
How can I find duplicate lines among files?
I have a software module which contains some files with same pattern. ` private static final long serialVersionUID = \dL;` How can I find files with the same value? ```shell $ grep -R serialVersionUID ./path/to/Some.java: private static final long serialVersionUID = 111L; ./path/to/Other.java: priva...
I have a software module which contains some files with same pattern.
private static final long serialVersionUID = \dL;
How can I find files with the same value?
$ grep -R serialVersionUID
./path/to/Some.java: private static final long serialVersionUID = 111L;
./path/to/Other.java: private static final long serialVersionUID = 222L;
./path/to/Another.java: private static final long serialVersionUID = 111L;
Not that different preceding indent between columns.
Now I want find those files with same value in the second column(private static final ...
)?
$ grep -R serialVersionUID | .....
./path/to/Some.java: private static final long serialVersionUID = 111L;
./path/to/Another.java: private static final long serialVersionUID = 111L;
Thanks.
This is all I could find, so far...
$ grep -R serialVersionUID | sed 's/[ ][ ]*/ /g' | sort -k 2
I have an improvement, yet it prints the second column only.
$ grep -R serialVersionUID | sed 's/[ ][ ]*/ /g' | sort -k 2 | uniq -f 2 -d
Jin Kwon
(564 rep)
Jul 25, 2024, 06:44 AM
• Last activity: Jul 26, 2024, 02:17 AM
1
votes
1
answers
177
views
Why is sorted uniq -c command showing duplicates
I am trying to count how many times I use a certain version of a library on my computer. For some reason, `uniq -c` is outputing duplicates, despite sorting it, and despite the sort order seeming in order. Any ideas or feedback? Thanks for your time. ### With `uniq -c` Input: ``` rg --no-line-number...
I am trying to count how many times I use a certain version of a library on my computer.
For some reason,
I then revised the regex slightly (sorry all, I didn't take all the suggestions on board, since one tiny tweak made it work, but I do have to say I learnt a lot by this, including using
uniq -c
is outputing duplicates, despite sorting it, and despite the sort order seeming in order.
Any ideas or feedback?
Thanks for your time.
### With uniq -c
Input:
rg --no-line-number --no-filename -g '*.csproj' "GitVersion.MsBuild" | sed -E '/GitVersion\.MsBuild" Version/!d;s/^\s\+//g;//\1 \2/g' | sort -n | uniq -c
Output:
3 GitVersion.MsBuild 5.10.1
1 GitVersion.MsBuild 5.10.1
3 GitVersion.MsBuild 5.10.3
11 GitVersion.MsBuild 5.11.1
5 GitVersion.MsBuild 5.11.1
25 GitVersion.MsBuild 5.12.0
2 GitVersion.MsBuild 5.12.0
1 GitVersion.MsBuild 5.6.11
2 GitVersion.MsBuild 5.7.0
4 GitVersion.MsBuild 5.8.1
### Without uniq -c
Input:
rg --no-line-number --no-filename -g '*.csproj' "GitVersion.MsBuild" | sed -E '/GitVersion\.MsBuild" Version/!d;s/^\s\+//g;//\1 \2/g' | sort -n
Output:
GitVersion.MsBuild 5.10.1
GitVersion.MsBuild 5.10.1
GitVersion.MsBuild 5.10.1
GitVersion.MsBuild 5.10.1
GitVersion.MsBuild 5.10.3
GitVersion.MsBuild 5.10.3
GitVersion.MsBuild 5.10.3
GitVersion.MsBuild 5.11.1
GitVersion.MsBuild 5.11.1
GitVersion.MsBuild 5.11.1
GitVersion.MsBuild 5.11.1
GitVersion.MsBuild 5.11.1
GitVersion.MsBuild 5.11.1
GitVersion.MsBuild 5.11.1
GitVersion.MsBuild 5.11.1
GitVersion.MsBuild 5.11.1
GitVersion.MsBuild 5.11.1
GitVersion.MsBuild 5.11.1
GitVersion.MsBuild 5.11.1
GitVersion.MsBuild 5.11.1
GitVersion.MsBuild 5.11.1
GitVersion.MsBuild 5.11.1
GitVersion.MsBuild 5.11.1
GitVersion.MsBuild 5.12.0
GitVersion.MsBuild 5.12.0
GitVersion.MsBuild 5.12.0
GitVersion.MsBuild 5.12.0
GitVersion.MsBuild 5.12.0
GitVersion.MsBuild 5.12.0
GitVersion.MsBuild 5.12.0
GitVersion.MsBuild 5.12.0
GitVersion.MsBuild 5.12.0
GitVersion.MsBuild 5.12.0
GitVersion.MsBuild 5.12.0
GitVersion.MsBuild 5.12.0
GitVersion.MsBuild 5.12.0
GitVersion.MsBuild 5.12.0
GitVersion.MsBuild 5.12.0
GitVersion.MsBuild 5.12.0
GitVersion.MsBuild 5.12.0
GitVersion.MsBuild 5.12.0
GitVersion.MsBuild 5.12.0
GitVersion.MsBuild 5.12.0
GitVersion.MsBuild 5.12.0
GitVersion.MsBuild 5.12.0
GitVersion.MsBuild 5.12.0
GitVersion.MsBuild 5.12.0
GitVersion.MsBuild 5.12.0
GitVersion.MsBuild 5.12.0
GitVersion.MsBuild 5.12.0
GitVersion.MsBuild 5.6.11
GitVersion.MsBuild 5.7.0
GitVersion.MsBuild 5.7.0
GitVersion.MsBuild 5.8.1
GitVersion.MsBuild 5.8.1
GitVersion.MsBuild 5.8.1
GitVersion.MsBuild 5.8.1
---
I've updated my command to pipe to xxd
as per @kos's suggestion. That helped in comparing.
rg --no-line-number --no-filename -g '*.csproj' "GitVersion.MsBuild" | sed -E '/GitVersion\.MsBuild" Version/!d;s/^\s\+//g;//\1 \2/g' | sort -n | uniq -c | xxd
That yielded (sorry for the screenshot, but it helps having the colors).

xxd
)
I simply added .*
after the >
:
rg --no-line-number --no-filename -g '*.csproj' "GitVersion.MsBuild" | sed -E '/GitVersion\.MsBuild" Version/!d;s/^\s\+//g;/.*$/\1 \2/g' | sort | uniq -c
And it now yields the correct (or satisfactory anyway) output:
4 GitVersion.MsBuild 5.10.1
3 GitVersion.MsBuild 5.10.3
16 GitVersion.MsBuild 5.11.1
27 GitVersion.MsBuild 5.12.0
1 GitVersion.MsBuild 5.6.11
2 GitVersion.MsBuild 5.7.0
4 GitVersion.MsBuild 5.8.1
Thanks team!
Albert
(171 rep)
May 16, 2024, 03:53 AM
• Last activity: May 17, 2024, 01:47 AM
-1
votes
2
answers
160
views
how to de-duplicate block (timestamp+command) from bash history?
I'm working with bash_history file containing blocks with the following format: `#unixtimestamp\ncommand\n` here's sample of the bash_history file: ``` #1713308636 cat > ./initramfs/init ./initramfs/init << "EOF" #!/bin/sh /bin/sh EOF #1713308642 file initramfs/init #1713308686 cpio -v -t -F init.cp...
I'm working with bash_history file containing blocks with the following format:
#unixtimestamp\ncommand\n
here's sample of the bash_history file:
#1713308636
cat > ./initramfs/init ./initramfs/init << "EOF"
#!/bin/sh
/bin/sh
EOF
#1713308642
file initramfs/init
#1713308686
cpio -v -t -F init.cpio
#1713308690
ls
as a workaround, I add the delete functionality to this program .
but I'm still open to other approaches that use existing commands.
ReYuki
(33 rep)
May 13, 2024, 06:43 AM
• Last activity: May 15, 2024, 04:42 PM
62
votes
5
answers
78521
views
How to get only the unique results without having to sort data?
$ cat data.txt aaaaaa aaaaaa cccccc aaaaaa aaaaaa bbbbbb $ cat data.txt | uniq aaaaaa cccccc aaaaaa bbbbbb $ cat data.txt | sort | uniq aaaaaa bbbbbb cccccc $ The result that I need is to **display all the lines from the original file removing all the duplicates (not just the consecutive ones), whil...
$ cat data.txt
aaaaaa
aaaaaa
cccccc
aaaaaa
aaaaaa
bbbbbb
$ cat data.txt | uniq
aaaaaa
cccccc
aaaaaa
bbbbbb
$ cat data.txt | sort | uniq
aaaaaa
bbbbbb
cccccc
$
The result that I need is to **display all the lines from the original file removing all the duplicates (not just the consecutive ones), while maintaining the original order of statements in the file**.
Here, in this example, the result that I actually was looking for was
aaaaaa
cccccc
bbbbbb
How can I perform this generalized
uniq
operation in general?
Lazer
(36085 rep)
Apr 24, 2011, 08:23 PM
• Last activity: Jan 28, 2024, 07:06 AM
0
votes
3
answers
68
views
Does a command exist that lists all the directories where a word appears in a file or directory name?
When I don't remember where a file or a folder is, I sometime use the `locate` command (that finds more occurrences, allow more candidates than a `find`, in my mind. But maybe I'm mistaking). But then there's a lot of responses, of course: ```bash locate clang ``` ```log /data/sauvegardes/dev/Java/E...
When I don't remember where a file or a folder is, I sometime use the
locate
command
(that finds more occurrences, allow more candidates than a find
, in my mind. But maybe I'm mistaking).
But then there's a lot of responses, of course:
locate clang
/data/sauvegardes/dev/Java/Experimentations/Angular4/bikes/node_modules/blocking-proxy/.clang-format
/data/sauvegardes/dev/Java/Experimentations/Angular4/bikes/node_modules/node-gyp/gyp/tools/Xcode/Specifications/gyp.xclangspec
/data/sauvegardes/dev/Java/Experimentations/Angular6/ng6-proj/node_modules/blocking-proxy/.clang-format
/data/sauvegardes/dev/Java/Experimentations/Angular6/ng6-proj/node_modules/node-gyp/gyp/tools/Xcode/Specifications/gyp.xclangspec
/data/sauvegardes/dev/Java/Experimentations/blog-demo/node/node_modules/npm/node_modules/node-gyp/gyp/tools/Xcode/Specifications/gyp.xclangspec
/data/sauvegardes/dev/Java/Experimentations/blog-demo/node_modules/blocking-proxy/.clang-format
/data/sauvegardes/dev/Java/Experimentations/blog-demo/node_modules/node-gyp/gyp/tools/Xcode/Specifications/gyp.xclangspec
/data/sauvegardes/dev/Java/Experimentations/ol-ext-angular/.metadata/.plugins/ts.eclipse.ide.server.nodejs.embed.win32.win32.x86_64/node-v6.9.4-win-x64/node_modules/npm/node_modules/node-gyp/gyp/tools/Xcode/Specifications/gyp.xclangspec
(201 responses)
I piped this command, with dirname
, sort
and uniq
to list only directories having such word in their name or carrying one or more file having it.
locate clang | xargs -L1 dirname | sort | uniq
it works...
/home/lebihan/dev/Java/comptes-france/metier-et-gestion/AdapterInboundWebEtude/etude/node_modules/node-gyp/gyp/tools/Xcode/Specifications
/home/lebihan/dev/Java/comptes-france/metier-et-gestion/AdapterInboundWebEtude/etude/node/node_modules/npm/node_modules/node-gyp/gyp/tools/Xcode/Specifications
/usr/include/boost/align/detail
/usr/include/boost/config/compiler
/usr/include/boost/predef/compiler
/usr/lib/linux-kbuild-6.1/scripts
/usr/lib/llvm-14/lib
/usr/lib/postgresql/14/lib/bitcode/postgres/commands
/usr/lib/x86_64-linux-gnu
/usr/local/go/misc/ios
/usr/local/go/src/debug/dwarf/testdata
/usr/local/go/src/debug/elf/testdata
/usr/local/go/src/debug/macho/testdata
/usr/share/doc
/usr/share/doc/libclang1-14
/usr/share/doc/libclang-cpp14
(108 responses)
But does _Linux_ have a command doing the same, more easily?
Marc Le Bihan
(2353 rep)
Oct 31, 2023, 07:34 AM
• Last activity: Oct 31, 2023, 10:20 AM
1
votes
5
answers
485
views
Find and delete partially duplicate lines
https://www.domain.com/files/G5SPNDOF/AAA-1080p.mp4.html https://www.domain2.com/dl/G5SPNDOF/JHCGTS/AAA-1080p.mp4.html https://www.domain.com/files/ZQWL80BG/AAA-1080p.mp4.html https://www.domain.com/files/SVSRS0AD/BBB-1080p.mp4.html https://www.domain.com/files/UCIONEMA/BBB-1080p.mp4.html Given a fi...
https://www.domain.com/files/G5SPNDOF/AAA-1080p.mp4.html
https://www.domain2.com/dl/G5SPNDOF/JHCGTS/AAA-1080p.mp4.html
https://www.domain.com/files/ZQWL80BG/AAA-1080p.mp4.html
https://www.domain.com/files/SVSRS0AD/BBB-1080p.mp4.html
https://www.domain.com/files/UCIONEMA/BBB-1080p.mp4.html
Given a file with above lines, how do I delete the ones that have duplicate files, but links overall are different, to end up with:
https://www.domain.com/files/G5SPNDOF/AAA-1080p.mp4.html
https://www.domain.com/files/SVSRS0AD/BBB-1080p.mp4.html
Bogdan Nicolae Stoian
(27 rep)
Oct 11, 2022, 07:27 AM
• Last activity: Oct 9, 2023, 03:17 AM
11
votes
1
answers
1030
views
Use uniq to filter adjacent lines in pipeline
I'm trying to monitor theme changes using this command: ```lang-shell dbus-monitor --session "interface='org.freedesktop.portal.Settings', member=SettingChanged" | grep -o "uint32 ." ``` Output right now looks like this: ``` uint32 0 uint32 0 uint32 1 uint32 1 uint32 0 uint32 0 uint32 1 uint32 1 ```...
I'm trying to monitor theme changes using this command:
-shell
dbus-monitor --session "interface='org.freedesktop.portal.Settings', member=SettingChanged" | grep -o "uint32 ."
Output right now looks like this:
uint32 0
uint32 0
uint32 1
uint32 1
uint32 0
uint32 0
uint32 1
uint32 1
This output comes from theme toggling. The theme notification shows up twice for some reason. Now I want to pipe it to uniq
so I only remain with a single entry like so:
uint32 0
uint32 1
uint32 0
uint32 1
However appending uniq
at the end does not produce any output anymore.
-shell
dbus-monitor --session "interface='org.freedesktop.portal.Settings', member=SettingChanged" | grep -o "uint32 ." | uniq
From man uniq
:
> Filter adjacent matching lines from INPUT (or standard input), writing to OUTPUT (or standard output).
uniq
needs to buffer at least the last output line to be able to detect adjacent lines, I don't see any reason why it could not buffer it and pass it along the pipeline. I've tried tweaking line buffering as suggested [here](https://unix.stackexchange.com/questions/295814/uniq-is-not-realtime-when-piped) but the results are still the same for me.
-shell
dbus-monitor --session "interface='org.freedesktop.portal.Settings', member=SettingChanged" | grep -o "uint32 ." | stdbuf -oL -i0 uniq
Pavel Skipenes
(235 rep)
Jun 18, 2023, 11:31 AM
• Last activity: Jun 20, 2023, 10:57 PM
11
votes
2
answers
43675
views
sort and uniq in awk
I know there are "sort" and "uniq" out there, however, today's question is about how to utilise AWK to do that kind of a job. Say if I have a list of anything really (ips, names, or numbers) and I want to sort them; Here is an example I am taking the IP numbers from a mail log: awk 'match($0,/\[[[:d...
I know there are "sort" and "uniq" out there, however, today's question is about how to utilise AWK to do that kind of a job. Say if I have a list of anything really (ips, names, or numbers) and I want to sort them;
Here is an example I am taking the IP numbers from a mail log:
awk 'match($0,/\[[[:digit:]]+\.[[:digit:]]+\.[[:digit:]]+\.[[:digit:]]+\]/) { if ( NF == 8 && $6 == "connect" ) {print substr($0, RSTART+1,RLENGTH-2)} }' maillog
Is it possible to sort them, ips, "on the go" within the same awk command? I do not require a complete answer to my question but some hints where to start.
Cheers!
Peter
(309 rep)
Mar 30, 2015, 08:10 AM
• Last activity: May 22, 2023, 12:01 PM
52
votes
2
answers
116169
views
Common lines between two files
I have the following code that I run on my Terminal. LC_ALL=C && grep -F -f genename2.txt hg38.hgnc.bed > hg38.hgnc.goi.bed This doesn't give me the common lines between the two files. What am I missing there?
I have the following code that I run on my Terminal.
LC_ALL=C && grep -F -f genename2.txt hg38.hgnc.bed > hg38.hgnc.goi.bed
This doesn't give me the common lines between the two files. What am I missing there?
Marwah Soliman
(713 rep)
Oct 14, 2017, 06:46 PM
• Last activity: May 16, 2023, 07:00 AM
0
votes
0
answers
26
views
numeric sort with unique option does not show 0!
I have a file with many redundant numbers in each row. Imagine something like the below: ``` echo "10 9 5 6 4 cell 3 2 0 7 0 1" > test ``` When I use `sort -un test` I get the following output: ``` cell 1 2 3 4 5 6 7 9 10 ``` while I expect the below (I mean `0` as a first row of the output): ``` 0...
I have a file with many redundant numbers in each row. Imagine something like the below:
echo "10
9
5
6
4
cell
3
2
0
7
0
1" > test
When I use sort -un test
I get the following output:
cell
1
2
3
4
5
6
7
9
10
while I expect the below (I mean 0
as a first row of the output):
0
1
2
3
4
5
6
7
9
10
Applying the sort -n
and then redirecting to uniq
doesn't make such a mess, however, it shows the non-numeric lines.
Is there any way to just use the sort
with -nu
to get 0
at the first line instead of an alphanumeric token?
javadr
(131 rep)
Aug 29, 2022, 12:14 PM
• Last activity: Aug 29, 2022, 12:21 PM
21
votes
3
answers
25738
views
Uniq won't remove duplicate
I was using the following command curl -silent http://api.openstreetmap.org/api/0.6/relation/2919627 http://api.openstreetmap.org/api/0.6/relation/2919628 | grep node | awk '{print $3}' | uniq when I wondered why `uniq` wouldn't remove the duplicates. Any idea why ?
I was using the following command
curl -silent http://api.openstreetmap.org/api/0.6/relation/2919627 http://api.openstreetmap.org/api/0.6/relation/2919628 | grep node | awk '{print $3}' | uniq
when I wondered why
uniq
wouldn't remove the duplicates. Any idea why ?
Matthieu Riegler
(535 rep)
Feb 8, 2014, 02:41 AM
• Last activity: Aug 1, 2022, 07:29 PM
0
votes
2
answers
307
views
tar processing files multiple times with find -newer
I'm trying to use tar(1) to create an archive of files newer than a specific file (`fileA`). However, when I use find(1) to obtain the list of files to pass to tar, some files are listed multiple times: ``` $ touch fileA $ mkdir test $ touch test/{fileB,fileC} $ tar -c -v $(find test -newer fileA) >...
I'm trying to use tar(1) to create an archive of files newer than a specific file (
fileA
). However, when I use find(1) to obtain the list of files to pass to tar, some files are listed multiple times:
$ touch fileA
$ mkdir test
$ touch test/{fileB,fileC}
$ tar -c -v $(find test -newer fileA) > test.tar
test/
test/fileC
test/fileB
test/fileC
test/fileB
Using xargs(1) to pass the list of files to tar results in similar behavior:
$ find test -newer fileA | xargs tar -c -v > test.tar
test/
test/fileC
test/fileB
test/fileC
test/fileB
Using sort(1) and uniq(1) to remove duplicates doesn't work either:
$ find test -newer fileA | sort | uniq | xargs tar -c -v > test.tar
test/
test/fileC
test/fileB
test/fileB
test/fileC
Is there a way for tar to only include each file newer than fileA
once?
**Edit:** I'm specifically looking for a solution that doesn't involve GNU extensions to tar (for example, which would work with suckless tar ).
Vilinkameni
(1639 rep)
Jul 6, 2022, 02:19 PM
• Last activity: Jul 6, 2022, 03:00 PM
179
votes
5
answers
293243
views
What is the difference between "sort -u" and "sort | uniq"?
Everywhere I see someone needing to get a sorted, unique list, they always pipe to `sort | uniq`. I've never seen any examples where someone uses `sort -u` instead. Why not? What's the difference, and why is it better to use uniq than the unique flag to sort?
Everywhere I see someone needing to get a sorted, unique list, they always pipe to
sort | uniq
. I've never seen any examples where someone uses sort -u
instead. Why not? What's the difference, and why is it better to use uniq than the unique flag to sort?
Benubird
(6082 rep)
May 16, 2013, 11:22 AM
• Last activity: May 31, 2022, 04:15 AM
4
votes
1
answers
10296
views
Difference between sort -u and uniq -u
I always have been using `sort -u` to get rid of duplicates until now. But I am having a real doubt about a list generated by a software tool. The question is: is the output of `sort -u |wc` the same as `uniq -u |wc`? Because they don't yield the same results. The manual for `uniq` specifies: > -u,...
I always have been using
sort -u
to get rid of duplicates until now.
But I am having a real doubt about a list generated by a software tool.
The question is: is the output of sort -u |wc
the same as uniq -u |wc
?
Because they don't yield the same results. The manual for uniq
specifies:
> -u, --unique only print unique lines
My output consists of 1110 words for which sort -u
keeps 1020 lines and uniq -u 1110
lines, the correct amount.
The issue is that I cannot visually spot any duplicates on the list which is generated by using >
at the end of the command line, and that there IS an issue with the total cracked passwords (in the context of customizing john the ripper).
Yvain
(248 rep)
May 30, 2022, 07:03 PM
• Last activity: May 30, 2022, 07:28 PM
0
votes
3
answers
77
views
de-duplicate list but group parts of it
I am compiling some access rules from failed logins and after some piping I arrived at this: ```shell cat <<INPUT | sort -k 3,3 --unique Deny from 13.42.98.142 # demo Deny from 13.42.98.142 # test Deny from 13.42.98.142 # user Deny from 133.142.200.152 # admin INPUT ``` Just out of interest, I would...
I am compiling some access rules from failed logins and after some piping I arrived at this:
cat <
Just out of interest, I would like to keep the tried login names (the last field).
My test code would output:
Deny from 13.42.98.142 # demo
Deny from 133.142.200.152 # admin
I am looking for an output like:
Deny from 13.42.98.142 # demo, test, user
Deny from 133.142.200.152 # admin
or even better (because it would be valid .htaccess
syntax):
# demo, test, user
Deny from 13.42.98.142
# admin
Deny from 133.142.200.152
**Note**:
The input is just how I made it now - I am not stubborn with it and can change it if it suits an elegant solution better. I'll accept also general answers how grouping in lists can be achieved in shell.
Jonas Eberle
(513 rep)
May 1, 2022, 11:10 AM
• Last activity: May 2, 2022, 06:01 AM
Showing page 1 of 20 total questions