Unix & Linux Stack Exchange

Q&A for users of Linux, FreeBSD and other Unix-like operating systems

Latest Questions

-1 votes

1 answers

1066 views

How to get unique occurrence of words from a very large file?

I have been asked to write a word frequency analysis program using the unix/ shell-scripts with the following requirements: - Input is a text file with one word per line - Input words are drawn from the Compact Oxford English Dictionary New Edition - Character encoding is UTF-8 - Input file is 1 Peb...

                                  I have been asked to write a word frequency analysis program using the
unix/ shell-scripts with the following requirements:

 - Input is a text file with one word per line
 - Input words are    drawn from the Compact Oxford English Dictionary New Edition
 - Character encoding is UTF-8 
 - Input file is 1 Pebibyte (PiB) in    length 
 - Output is of the format “ Word occurred N times”

I am aware of one of the way to begin with as below ---
cat filename | xargs -n1 | sort | uniq -c > newfilename

What should be the best optimal way to do that considering performance as well?
                                

Pratik Barjatiya (23 rep)

Dec 29, 2017, 06:27 AM • Last activity: Apr 15, 2025, 03:40 PM

2 votes

4 answers

9322 views

finding all non-unique lines in a file

uniq

I'm trying to use uniq to find all non-unique lines in a file. By non-unique, I mean any line that I have already seen on the previous line. I thought that the "-D" option would do this: -D print all duplicate lines But instead of just printing the duplicate lines, it prints *all* the lines when the...

                                  I'm trying to use uniq to find all non-unique lines in a file.  By non-unique, I mean any line that I have already seen on the previous line. I thought that the "-D" option would do this:

    -D     print all duplicate lines

But instead of just printing the duplicate lines, it prints *all* the lines when there is more than one.  I want to only print the second and subsequent copies of a line.

How can I do this?

Michael (544 rep)

Nov 7, 2019, 10:18 PM • Last activity: Apr 8, 2025, 06:45 PM

0 votes

2 answers

57 views

pipe to uniq from a variable not showing the desired output

bash uniq bash-array

I have a pipeline using array jobs and need to change the number of inputs for some steps. I thought about testing `uniq` since the only part changing in my folders are the last four characters (the *hap* part in the example). So, all my paths look something like: ``` /mnt/nvme/user/something1/hap1...

I have a pipeline using array jobs and need to change the number of inputs for some steps. I thought about testing uniq since the only part changing in my folders are the last four characters (the *hap* part in the example). So, all my paths look something like:

/mnt/nvme/user/something1/hap1
/mnt/nvme/user/something1/hap2
/mnt/nvme/user/something2/hap1
/mnt/nvme/user/something2/hap2

and what I'm doing is the following:

DIR=( "/mnt/nvme/ungaro/something1/hap1" "/mnt/nvme/ungaro/something1/hap2" "/mnt/nvme/ungaro/something2/hap1" "/mnt/nvme/ungaro/something2/hap2" )

for dir in "${DIR[@]}"; do echo $dir | sed 's#/hap[0-9]##' | uniq; done

But the resulting output always displays all the elements in the variable without collapsing the duplicate rows after removing the *hap* part of each one of them. Probably I'm missing something, could it be that the for forces to print all lines anyway. If so, is there a way to attained the desired result in a single line command?

Matteo (209 rep)

Feb 10, 2025, 04:25 PM • Last activity: Feb 10, 2025, 05:16 PM

121 votes

15 answers

57558 views

How can I remove duplicates in my .bash_history, preserving order?

bash command-line command-history sort uniq

I really enjoying using `control+r` to recursively search my command history. I've found a few good options I like to use with it: # ignore duplicate commands, ignore commands starting with a space export HISTCONTROL=erasedups:ignorespace # keep the last 5000 entries export HISTSIZE=5000 # append to...

                                  I really enjoying using control+r to recursively search my command history. I've found a few good options I like to use with it:

    # ignore duplicate commands, ignore commands starting with a space
    export HISTCONTROL=erasedups:ignorespace

    # keep the last 5000 entries
    export HISTSIZE=5000

    # append to the history instead of overwriting (good for multiple connections)
    shopt -s histappend

The only problem for me is that erasedups only erases sequential duplicates - so that with this string of commands:

    ls
    cd ~
    ls

The ls command will actually be recorded twice. I've thought about periodically running w/ cron:

    cat .bash_history | sort | uniq > temp.txt
    mv temp.txt .bash_history

This would achieve removing the duplicates, but unfortunately the order would not be preserved. If I don't sort the file first I don't believe uniq can work properly.

How can I remove duplicates in my .bash_history, preserving order?

### Extra Credit:

Are there any problems with overwriting the .bash_history file via a script? For example, if you remove an apache log file I think you need to send a nohup / reset signal with kill to have it flush it's connection to the file. If that is the case with the .bash_history file, perhaps I could somehow use ps to check and make sure there are no connected sessions before the filtering script is run?

cwd (46887 rep)

Sep 20, 2012, 02:55 PM • Last activity: Feb 3, 2025, 01:47 PM

1 votes

2 answers

825 views

How to sort or uniq a live feed

sort uniq

I'm looking to sort and isolate IP from a `tcpdump` live feed. tcpdump -n -i tun0 "tcp[tcpflags] & (tcp-syn) != 0" | grep -E -o "([0-9]{1,3}[\.]){3}[0-9]{1,3} works just fine but when I try to add the `uniq`program it fails: tcpdump -n -i tun0 "tcp[tcpflags] & (tcp-syn) != 0" | grep -E -o "([0-9]{1,...

                                  I'm looking to sort and isolate IP from a tcpdump live feed.

    tcpdump -n -i tun0 "tcp[tcpflags] & (tcp-syn) != 0" | grep -E -o "([0-9]{1,3}[\.]){3}[0-9]{1,3}

works just fine but when I try to add the uniqprogram it fails:

    tcpdump -n -i tun0 "tcp[tcpflags] & (tcp-syn) != 0" | grep -E -o "([0-9]{1,3}[\.]){3}[0-9]{1,3}" |  uniq -u

returns nothing.

Same with sort -u.

Any idea on how to fix this ?

ChiseledAbs (2301 rep)

Jul 8, 2016, 10:29 AM • Last activity: Jul 26, 2024, 05:54 AM

1 votes

1 answers

98 views

How can I find duplicate lines among files?

awk grep sort macos uniq

I have a software module which contains some files with same pattern. ` private static final long serialVersionUID = \dL;` How can I find files with the same value? ```shell $ grep -R serialVersionUID ./path/to/Some.java: private static final long serialVersionUID = 111L; ./path/to/Other.java: priva...

I have a software module which contains some files with same pattern. private static final long serialVersionUID = \dL; How can I find files with the same value?

$ grep -R serialVersionUID
./path/to/Some.java:    private static final long serialVersionUID = 111L;
./path/to/Other.java:        private static final long serialVersionUID = 222L;
./path/to/Another.java:        private static final long serialVersionUID = 111L;

Not that different preceding indent between columns. Now I want find those files with same value in the second column(private static final ...)?

$ grep -R serialVersionUID | .....
./path/to/Some.java:    private static final long serialVersionUID = 111L;
./path/to/Another.java:        private static final long serialVersionUID = 111L;

Thanks. This is all I could find, so far...

$ grep -R serialVersionUID | sed 's/[ ][ ]*/ /g' | sort -k 2

I have an improvement, yet it prints the second column only.

$ grep -R serialVersionUID | sed 's/[ ][ ]*/ /g' | sort -k 2 | uniq -f 2 -d

Jin Kwon (564 rep)

Jul 25, 2024, 06:44 AM • Last activity: Jul 26, 2024, 02:17 AM

1 votes

1 answers

177 views

Why is sorted uniq -c command showing duplicates

sed sort uniq

I am trying to count how many times I use a certain version of a library on my computer. For some reason, `uniq -c` is outputing duplicates, despite sorting it, and despite the sort order seeming in order. Any ideas or feedback? Thanks for your time. ### With `uniq -c` Input: ``` rg --no-line-number...

I am trying to count how many times I use a certain version of a library on my computer. For some reason, uniq -c is outputing duplicates, despite sorting it, and despite the sort order seeming in order. Any ideas or feedback? Thanks for your time. ### With uniq -c Input:

rg --no-line-number --no-filename -g '*.csproj' "GitVersion.MsBuild" | sed -E '/GitVersion\.MsBuild" Version/!d;s/^\s\+//g;//\1   \2/g' | sort -n | uniq -c

Output:

3 GitVersion.MsBuild      5.10.1
      1 GitVersion.MsBuild      5.10.1
      3 GitVersion.MsBuild      5.10.3
     11 GitVersion.MsBuild      5.11.1
      5 GitVersion.MsBuild      5.11.1
     25 GitVersion.MsBuild      5.12.0
      2 GitVersion.MsBuild      5.12.0
      1 GitVersion.MsBuild      5.6.11
      2 GitVersion.MsBuild      5.7.0
      4 GitVersion.MsBuild      5.8.1

### Without uniq -c Input:

rg --no-line-number --no-filename -g '*.csproj' "GitVersion.MsBuild" | sed -E '/GitVersion\.MsBuild" Version/!d;s/^\s\+//g;//\1   \2/g' | sort -n

Output:

GitVersion.MsBuild      5.10.1
GitVersion.MsBuild      5.10.1
GitVersion.MsBuild      5.10.1
GitVersion.MsBuild      5.10.1
GitVersion.MsBuild      5.10.3
GitVersion.MsBuild      5.10.3
GitVersion.MsBuild      5.10.3
GitVersion.MsBuild      5.11.1
GitVersion.MsBuild      5.11.1
GitVersion.MsBuild      5.11.1
GitVersion.MsBuild      5.11.1
GitVersion.MsBuild      5.11.1
GitVersion.MsBuild      5.11.1
GitVersion.MsBuild      5.11.1
GitVersion.MsBuild      5.11.1
GitVersion.MsBuild      5.11.1
GitVersion.MsBuild      5.11.1
GitVersion.MsBuild      5.11.1
GitVersion.MsBuild      5.11.1
GitVersion.MsBuild      5.11.1
GitVersion.MsBuild      5.11.1
GitVersion.MsBuild      5.11.1
GitVersion.MsBuild      5.11.1
GitVersion.MsBuild      5.12.0
GitVersion.MsBuild      5.12.0
GitVersion.MsBuild      5.12.0
GitVersion.MsBuild      5.12.0
GitVersion.MsBuild      5.12.0
GitVersion.MsBuild      5.12.0
GitVersion.MsBuild      5.12.0
GitVersion.MsBuild      5.12.0
GitVersion.MsBuild      5.12.0
GitVersion.MsBuild      5.12.0
GitVersion.MsBuild      5.12.0
GitVersion.MsBuild      5.12.0
GitVersion.MsBuild      5.12.0
GitVersion.MsBuild      5.12.0
GitVersion.MsBuild      5.12.0
GitVersion.MsBuild      5.12.0
GitVersion.MsBuild      5.12.0
GitVersion.MsBuild      5.12.0
GitVersion.MsBuild      5.12.0
GitVersion.MsBuild      5.12.0
GitVersion.MsBuild      5.12.0
GitVersion.MsBuild      5.12.0
GitVersion.MsBuild      5.12.0
GitVersion.MsBuild      5.12.0
GitVersion.MsBuild      5.12.0
GitVersion.MsBuild      5.12.0
GitVersion.MsBuild      5.12.0
GitVersion.MsBuild      5.6.11
GitVersion.MsBuild      5.7.0
GitVersion.MsBuild      5.7.0
GitVersion.MsBuild      5.8.1
GitVersion.MsBuild      5.8.1
GitVersion.MsBuild      5.8.1
GitVersion.MsBuild      5.8.1

--- I've updated my command to pipe to xxd as per @kos's suggestion. That helped in comparing.

rg --no-line-number --no-filename -g '*.csproj' "GitVersion.MsBuild" | sed -E '/GitVersion\.MsBuild" Version/!d;s/^\s\+//g;//\1     \2/g' | sort -n | uniq -c | xxd

That yielded (sorry for the screenshot, but it helps having the colors).

I then revised the regex slightly (sorry all, I didn't take all the suggestions on board, since one tiny tweak made it work, but I do have to say I learnt a lot by this, including using xxd) I simply added .* after the >:

rg --no-line-number --no-filename -g '*.csproj' "GitVersion.MsBuild" | sed -E '/GitVersion\.MsBuild" Version/!d;s/^\s\+//g;/.*$/\1  \2/g' | sort | uniq -c

And it now yields the correct (or satisfactory anyway) output:

4 GitVersion.MsBuild      5.10.1
      3 GitVersion.MsBuild      5.10.3
     16 GitVersion.MsBuild      5.11.1
     27 GitVersion.MsBuild      5.12.0
      1 GitVersion.MsBuild      5.6.11
      2 GitVersion.MsBuild      5.7.0
      4 GitVersion.MsBuild      5.8.1

Thanks team!

Albert (171 rep)

May 16, 2024, 03:53 AM • Last activity: May 17, 2024, 01:47 AM

-1 votes

2 answers

160 views

how to de-duplicate block (timestamp+command) from bash history?

bash awk sed scripting uniq

I'm working with bash_history file containing blocks with the following format: `#unixtimestamp\ncommand\n` here's sample of the bash_history file: ``` #1713308636 cat > ./initramfs/init ./initramfs/init << "EOF" #!/bin/sh /bin/sh EOF #1713308642 file initramfs/init #1713308686 cpio -v -t -F init.cp...

I'm working with bash_history file containing blocks with the following format: #unixtimestamp\ncommand\n here's sample of the bash_history file:

#1713308636
cat > ./initramfs/init  ./initramfs/init << "EOF"
#!/bin/sh
/bin/sh
EOF
#1713308642
file initramfs/init
#1713308686
cpio -v -t -F init.cpio
#1713308690
ls

as a workaround, I add the delete functionality to this program . but I'm still open to other approaches that use existing commands.

ReYuki (33 rep)

May 13, 2024, 06:43 AM • Last activity: May 15, 2024, 04:42 PM

62 votes

5 answers

78521 views

How to get only the unique results without having to sort data?

text-processing uniq

$ cat data.txt aaaaaa aaaaaa cccccc aaaaaa aaaaaa bbbbbb $ cat data.txt | uniq aaaaaa cccccc aaaaaa bbbbbb $ cat data.txt | sort | uniq aaaaaa bbbbbb cccccc $ The result that I need is to **display all the lines from the original file removing all the duplicates (not just the consecutive ones), whil...

                                      $ cat data.txt 
    aaaaaa
    aaaaaa
    cccccc
    aaaaaa
    aaaaaa
    bbbbbb
    $ cat data.txt | uniq
    aaaaaa
    cccccc
    aaaaaa
    bbbbbb
    $ cat data.txt | sort | uniq
    aaaaaa
    bbbbbb
    cccccc
    $

The result that I need is to **display all the lines from the original file removing all the duplicates (not just the consecutive ones), while maintaining the original order of statements in the file**.

Here, in this example, the result that I actually was looking for was

    aaaaaa
    cccccc
    bbbbbb

How can I perform this generalized uniq operation in general?
                                

Lazer (36085 rep)

Apr 24, 2011, 08:23 PM • Last activity: Jan 28, 2024, 07:06 AM

0 votes

3 answers

68 views

Does a command exist that lists all the directories where a word appears in a file or directory name?

linux sort uniq locate dirname

When I don't remember where a file or a folder is, I sometime use the `locate` command (that finds more occurrences, allow more candidates than a `find`, in my mind. But maybe I'm mistaking). But then there's a lot of responses, of course: ```bash locate clang ``` ```log /data/sauvegardes/dev/Java/E...

When I don't remember where a file or a folder is, I sometime use the locate command (that finds more occurrences, allow more candidates than a find, in my mind. But maybe I'm mistaking). But then there's a lot of responses, of course:

locate clang

/data/sauvegardes/dev/Java/Experimentations/Angular4/bikes/node_modules/blocking-proxy/.clang-format
/data/sauvegardes/dev/Java/Experimentations/Angular4/bikes/node_modules/node-gyp/gyp/tools/Xcode/Specifications/gyp.xclangspec
/data/sauvegardes/dev/Java/Experimentations/Angular6/ng6-proj/node_modules/blocking-proxy/.clang-format
/data/sauvegardes/dev/Java/Experimentations/Angular6/ng6-proj/node_modules/node-gyp/gyp/tools/Xcode/Specifications/gyp.xclangspec
/data/sauvegardes/dev/Java/Experimentations/blog-demo/node/node_modules/npm/node_modules/node-gyp/gyp/tools/Xcode/Specifications/gyp.xclangspec
/data/sauvegardes/dev/Java/Experimentations/blog-demo/node_modules/blocking-proxy/.clang-format
/data/sauvegardes/dev/Java/Experimentations/blog-demo/node_modules/node-gyp/gyp/tools/Xcode/Specifications/gyp.xclangspec
/data/sauvegardes/dev/Java/Experimentations/ol-ext-angular/.metadata/.plugins/ts.eclipse.ide.server.nodejs.embed.win32.win32.x86_64/node-v6.9.4-win-x64/node_modules/npm/node_modules/node-gyp/gyp/tools/Xcode/Specifications/gyp.xclangspec

(201 responses)

I piped this command, with dirname, sort and uniq to list only directories having such word in their name or carrying one or more file having it.

locate clang | xargs -L1 dirname | sort | uniq

it works...

/home/lebihan/dev/Java/comptes-france/metier-et-gestion/AdapterInboundWebEtude/etude/node_modules/node-gyp/gyp/tools/Xcode/Specifications
/home/lebihan/dev/Java/comptes-france/metier-et-gestion/AdapterInboundWebEtude/etude/node/node_modules/npm/node_modules/node-gyp/gyp/tools/Xcode/Specifications
/usr/include/boost/align/detail
/usr/include/boost/config/compiler
/usr/include/boost/predef/compiler
/usr/lib/linux-kbuild-6.1/scripts
/usr/lib/llvm-14/lib
/usr/lib/postgresql/14/lib/bitcode/postgres/commands
/usr/lib/x86_64-linux-gnu
/usr/local/go/misc/ios
/usr/local/go/src/debug/dwarf/testdata
/usr/local/go/src/debug/elf/testdata
/usr/local/go/src/debug/macho/testdata
/usr/share/doc
/usr/share/doc/libclang1-14
/usr/share/doc/libclang-cpp14

(108 responses)

But does _Linux_ have a command doing the same, more easily?

Marc Le Bihan (2353 rep)

Oct 31, 2023, 07:34 AM • Last activity: Oct 31, 2023, 10:20 AM

1 votes

5 answers

485 views

Find and delete partially duplicate lines

text-processing awk sed uniq

https://www.domain.com/files/G5SPNDOF/AAA-1080p.mp4.html https://www.domain2.com/dl/G5SPNDOF/JHCGTS/AAA-1080p.mp4.html https://www.domain.com/files/ZQWL80BG/AAA-1080p.mp4.html https://www.domain.com/files/SVSRS0AD/BBB-1080p.mp4.html https://www.domain.com/files/UCIONEMA/BBB-1080p.mp4.html Given a fi...

                                      https://www.domain.com/files/G5SPNDOF/AAA-1080p.mp4.html 
    https://www.domain2.com/dl/G5SPNDOF/JHCGTS/AAA-1080p.mp4.html 
    https://www.domain.com/files/ZQWL80BG/AAA-1080p.mp4.html 
    https://www.domain.com/files/SVSRS0AD/BBB-1080p.mp4.html 
    https://www.domain.com/files/UCIONEMA/BBB-1080p.mp4.html 

Given a file with above lines, how do I delete the ones that have duplicate files, but links overall are different, to end up with:

    https://www.domain.com/files/G5SPNDOF/AAA-1080p.mp4.html 
    https://www.domain.com/files/SVSRS0AD/BBB-1080p.mp4.html

Bogdan Nicolae Stoian (27 rep)

Oct 11, 2022, 07:27 AM • Last activity: Oct 9, 2023, 03:17 AM

11 votes

1 answers

1030 views

Use uniq to filter adjacent lines in pipeline

grep uniq

I'm trying to monitor theme changes using this command: ```lang-shell dbus-monitor --session "interface='org.freedesktop.portal.Settings', member=SettingChanged" | grep -o "uint32 ." ``` Output right now looks like this: ``` uint32 0 uint32 0 uint32 1 uint32 1 uint32 0 uint32 0 uint32 1 uint32 1 ```...

I'm trying to monitor theme changes using this command:

-shell
dbus-monitor --session "interface='org.freedesktop.portal.Settings', member=SettingChanged" | grep -o "uint32 ."

Output right now looks like this:

uint32 0
uint32 0
uint32 1
uint32 1
uint32 0
uint32 0
uint32 1
uint32 1

This output comes from theme toggling. The theme notification shows up twice for some reason. Now I want to pipe it to uniq so I only remain with a single entry like so:

uint32 0
uint32 1
uint32 0
uint32 1

However appending uniq at the end does not produce any output anymore.

-shell
dbus-monitor --session "interface='org.freedesktop.portal.Settings', member=SettingChanged" | grep -o "uint32 ." | uniq

From man uniq: > Filter adjacent matching lines from INPUT (or standard input), writing to OUTPUT (or standard output). uniq needs to buffer at least the last output line to be able to detect adjacent lines, I don't see any reason why it could not buffer it and pass it along the pipeline. I've tried tweaking line buffering as suggested [here](https://unix.stackexchange.com/questions/295814/uniq-is-not-realtime-when-piped) but the results are still the same for me.

-shell
dbus-monitor --session "interface='org.freedesktop.portal.Settings', member=SettingChanged" | grep -o "uint32 ." | stdbuf -oL -i0 uniq

Pavel Skipenes (235 rep)

Jun 18, 2023, 11:31 AM • Last activity: Jun 20, 2023, 10:57 PM

11 votes

2 answers

43675 views

sort and uniq in awk

awk sort uniq

I know there are "sort" and "uniq" out there, however, today's question is about how to utilise AWK to do that kind of a job. Say if I have a list of anything really (ips, names, or numbers) and I want to sort them; Here is an example I am taking the IP numbers from a mail log: awk 'match($0,/\[[[:d...

                                  I know there are "sort" and "uniq" out there, however, today's question is about how to utilise AWK to do that kind of a job. Say if I have a list of anything really (ips, names, or numbers) and I want to sort them; 

Here is an example I am taking the IP numbers from a mail log:

    awk 'match($0,/\[[[:digit:]]+\.[[:digit:]]+\.[[:digit:]]+\.[[:digit:]]+\]/) { if ( NF == 8 && $6 == "connect" ) {print substr($0, RSTART+1,RLENGTH-2)} }' maillog

Is it possible to sort them, ips, "on the go" within the same awk command? I do not require a complete answer to my question but some hints where to start.

Cheers!

Peter (309 rep)

Mar 30, 2015, 08:10 AM • Last activity: May 22, 2023, 12:01 PM

52 votes

2 answers

116169 views

Common lines between two files

text-processing awk diff uniq comm

I have the following code that I run on my Terminal. LC_ALL=C && grep -F -f genename2.txt hg38.hgnc.bed > hg38.hgnc.goi.bed This doesn't give me the common lines between the two files. What am I missing there?

                                  I have the following code that I run on my Terminal.

    LC_ALL=C && grep -F -f  genename2.txt hg38.hgnc.bed > hg38.hgnc.goi.bed

This doesn't give me the common lines between the two files. What am I missing there?

Marwah Soliman (713 rep)

Oct 14, 2017, 06:46 PM • Last activity: May 16, 2023, 07:00 AM

0 votes

0 answers

26 views

numeric sort with unique option does not show 0!

sort numeric-data uniq

I have a file with many redundant numbers in each row. Imagine something like the below: ``` echo "10 9 5 6 4 cell 3 2 0 7 0 1" > test ``` When I use `sort -un test` I get the following output: ``` cell 1 2 3 4 5 6 7 9 10 ``` while I expect the below (I mean `0` as a first row of the output): ``` 0...

I have a file with many redundant numbers in each row. Imagine something like the below:

echo "10
9
5
6
4
cell
3
2
0
7
0
1" > test

When I use sort -un test I get the following output:

while I expect the below (I mean 0 as a first row of the output):

Applying the sort -n and then redirecting to uniq doesn't make such a mess, however, it shows the non-numeric lines. Is there any way to just use the sort with -nu to get 0 at the first line instead of an alphanumeric token?

javadr (131 rep)

Aug 29, 2022, 12:14 PM • Last activity: Aug 29, 2022, 12:21 PM

21 votes

3 answers

25738 views

Uniq won't remove duplicate

command-line curl uniq

I was using the following command curl -silent http://api.openstreetmap.org/api/0.6/relation/2919627 http://api.openstreetmap.org/api/0.6/relation/2919628 | grep node | awk '{print $3}' | uniq when I wondered why `uniq` wouldn't remove the duplicates. Any idea why ?

                                  I was using the following command 

    curl -silent http://api.openstreetmap.org/api/0.6/relation/2919627  http://api.openstreetmap.org/api/0.6/relation/2919628  | grep node | awk '{print $3}' | uniq

when I wondered why uniq wouldn't remove the duplicates. Any idea why ?

Matthieu Riegler (535 rep)

Feb 8, 2014, 02:41 AM • Last activity: Aug 1, 2022, 07:29 PM

0 votes

2 answers

307 views

tar processing files multiple times with find -newer

find tar sort xargs uniq

I'm trying to use tar(1) to create an archive of files newer than a specific file (`fileA`). However, when I use find(1) to obtain the list of files to pass to tar, some files are listed multiple times: ``` $ touch fileA $ mkdir test $ touch test/{fileB,fileC} $ tar -c -v $(find test -newer fileA) >...

I'm trying to use tar(1) to create an archive of files newer than a specific file (fileA). However, when I use find(1) to obtain the list of files to pass to tar, some files are listed multiple times:

$ touch fileA
$ mkdir test
$ touch test/{fileB,fileC}
$ tar -c -v $(find test -newer fileA) > test.tar
test/
test/fileC
test/fileB
test/fileC
test/fileB

Using xargs(1) to pass the list of files to tar results in similar behavior:

$ find test -newer fileA | xargs tar -c -v > test.tar
test/
test/fileC
test/fileB
test/fileC
test/fileB

Using sort(1) and uniq(1) to remove duplicates doesn't work either:

$ find test -newer fileA | sort | uniq | xargs tar -c -v > test.tar
test/
test/fileC
test/fileB
test/fileB
test/fileC

Is there a way for tar to only include each file newer than fileA once? **Edit:** I'm specifically looking for a solution that doesn't involve GNU extensions to tar (for example, which would work with suckless tar ).

Vilinkameni (1639 rep)

Jul 6, 2022, 02:19 PM • Last activity: Jul 6, 2022, 03:00 PM

179 votes

5 answers

293243 views

What is the difference between "sort -u" and "sort | uniq"?

bash sort uniq

Everywhere I see someone needing to get a sorted, unique list, they always pipe to `sort | uniq`. I've never seen any examples where someone uses `sort -u` instead. Why not? What's the difference, and why is it better to use uniq than the unique flag to sort?

                                  Everywhere I see someone needing to get a sorted, unique list, they always pipe to sort | uniq. I've never seen any examples where someone uses sort -u instead. Why not? What's the difference, and why is it better to use uniq than the unique flag to sort?
                                

Benubird (6082 rep)

May 16, 2013, 11:22 AM • Last activity: May 31, 2022, 04:15 AM

4 votes

1 answers

10296 views

Difference between sort -u and uniq -u

command-line sort uniq

I always have been using `sort -u` to get rid of duplicates until now. But I am having a real doubt about a list generated by a software tool. The question is: is the output of `sort -u |wc` the same as `uniq -u |wc`? Because they don't yield the same results. The manual for `uniq` specifies: > -u,...

                                  I always have been using sort -u to get rid of duplicates until now.  
But I am having a real doubt about a list generated by a software tool.  
The question is: is the output of sort -u |wc the same as uniq -u |wc?
  
Because they don't yield the same results. The manual for uniq specifies:

> -u, --unique only print unique lines

My output consists of 1110 words for which sort -u keeps 1020 lines and uniq -u 1110 lines, the correct amount.
The issue is that I cannot visually spot any duplicates on the list which is generated by using > at the end of the command line, and that there IS an issue with the total cracked passwords (in the context of customizing john the ripper).

Yvain (248 rep)

May 30, 2022, 07:03 PM • Last activity: May 30, 2022, 07:28 PM

0 votes

3 answers

77 views

de-duplicate list but group parts of it

bash shell sort uniq

I am compiling some access rules from failed logins and after some piping I arrived at this: ```shell cat <<INPUT | sort -k 3,3 --unique Deny from 13.42.98.142 # demo Deny from 13.42.98.142 # test Deny from 13.42.98.142 # user Deny from 133.142.200.152 # admin INPUT ``` Just out of interest, I would...

I am compiling some access rules from failed logins and after some piping I arrived at this:

cat <



Just out of interest, I would like to keep the tried login names (the last field).
My test code would output:
Deny from 13.42.98.142 # demo
Deny from 133.142.200.152 # admin


I am looking for an output like:
Deny from 13.42.98.142 # demo, test, user
Deny from 133.142.200.152 # admin

or even better (because it would be valid .htaccess syntax):
# demo, test, user
Deny from 13.42.98.142
# admin
Deny from 133.142.200.152

**Note**:
The input is just how I made it now - I am not stubborn with it and can change it if it suits an elegant solution better. I'll accept also general answers how grouping in lists can be achieved in shell.


                          
                          
                        
                        
                        
                          
                            
                            Jonas Eberle
                            (513 rep)
                          
                          
                            
                            May 1, 2022, 11:10 AM
                            
                              • Last activity: May 2, 2022, 06:01 AM


          
          

          
          
            
              
                
                  
                    Previous
                  
                
                
                
                  Page 1
                
                
                
                  
                    
                      Next
                    
                  
                
              
            
            
            
              
              Showing page 1 of 20 total questions