Unix & Linux Stack Exchange

Q&A for users of Linux, FreeBSD and other Unix-like operating systems

Latest Questions

32 votes

2 answers

37403 views

Where do the words in /usr/share/dict/words come from?

[`/usr/share/dict/words`](https://en.wikipedia.org/wiki/Words_(Unix)) contains lots of words. How is this list generated? Are its contents the same across different Unices? Is there any standard dictating what it must contain? All I've been able to turn up so far is that on Ubuntu/Debian the list co...

                                  [/usr/share/dict/words](https://en.wikipedia.org/wiki/Words_(Unix))  contains lots of words. How is this list generated? Are its contents the same across different Unices? Is there any standard dictating what it must contain?

All I've been able to turn up so far is that on Ubuntu/Debian the list comes from the [wordlist](https://packages.debian.org/sid/wordlist)  packages, but their descriptions offer no clue on how the lists were actually generated.

Mark Amery (3220 rep)

Jul 2, 2015, 07:23 PM • Last activity: Jul 29, 2025, 09:25 PM

0 votes

0 answers

40 views

How to convert list of words separate by comma to column?

words

How to convert list of words separate by comma to column? I have this "word, other word, another one" I want word other word another one How to do?

                                  How to convert list of words separate by comma to column?

I have this

    "word, other word, another one"

I want

    word
    other word
    another one

How to do?

elbarna (13690 rep)

Feb 25, 2025, 08:44 PM • Last activity: Feb 25, 2025, 08:48 PM

0 votes

1 answers

166 views

Does /usr/share/dict/words contain personal information?

privacy dictionary words

I am considering including a copy of my `/usr/share/dict/words` file in a public GitHub repository for a project that requires dictionaries. Is this a bad idea, and if so, why? I'm particularly interested in the privacy/security (or even legal?) aspects. Are there common programs that add words to t...

                                  I am considering including a copy of my /usr/share/dict/words file in a public GitHub repository for a project that requires dictionaries. Is this a bad idea, and if so, why?

I'm particularly interested in the privacy/security (or even legal?) aspects. Are there common programs that add words to this dictionary, for example if I choose "Add to Dictionary" in a spell checker? Is the file likely to contain any sensitive information, such as my username (I checked that, and it doesn't, but there could be similar things I didn't think to check). It'd be impractical to look through all 104,334 words. Perhaps it's just the usr in the path making me unnecessarily concerned.

I've read over these  questions  about where the words come from. However, is it probable that any words have since been added or removed?

I suppose if nothing has changed, I could just get the source. But if some programs added helpful (non-personal) words, I'd want to keep those.

In case it's important, I am running Ubuntu 23.10. But I'd prefer a slightly more general answer, if possible.

### Note
I am fully aware that
- it would be possible to point to the file path in code rather than "hard coding" it into the repo, and
- this may not be the best free English  word  list .

However, I'm not interested in using a different list *instead* of this one (in such a case, I'd rather just use both). And if I use a list, it's necessary that I can include the actual file.

kviLL (103 rep)

Jun 3, 2024, 08:20 PM • Last activity: Jun 4, 2024, 09:14 AM

3 votes

2 answers

1119 views

I want to find lines where a specific word appears in a file along with line number and take the line numbers in an array .How to do that in bash?

bash shell-script files find words

This returns line numbers but they are in a string: ``` grep -n -F -w $word $file | cut -d : -f 1 ```

This returns line numbers but they are in a string:

grep -n -F -w $word $file | cut -d : -f 1

Arpan Koley (31 rep)

Aug 24, 2023, 05:01 AM • Last activity: Aug 26, 2023, 04:11 AM

25 votes

17 answers

14011 views

Bash script: split word on each letter

command-line split words

How can I split a word's letters, with each letter in a separate line? For example, given `"StackOver"` I would like to see S t a c k O v e r I'm new to bash so I have no clue where to start.

                                  How can I split a word's letters, with each letter in a separate line?

For example, given "StackOver" 
I would like to see

    S
    t
    a
    c
    k
    O
    v
    e
    r

I'm new to bash so I have no clue where to start.

                                

Sijaan Hallak (529 rep)

Jan 4, 2016, 11:41 PM • Last activity: May 1, 2023, 12:43 PM

0 votes

1 answers

713 views

Creating word list from document

text-processing words

I'm trying to find an efficient way of learning vocabulary in new languages. I'd like to be able to create word lists using files that contain books. I'm new to Linux, any help much appreciated. I would like to: * have a command that will take as input a text file (txt format for example) and output...

                                  I'm trying to find an efficient way of learning vocabulary in new languages. I'd like to be able to create word lists using files that contain books.
I'm new to Linux, any help much appreciated.

I would like to:

* have a command that will take as input a text file (txt format for example) and output another file that contains a list of all individual words in the first file. 

* The new file should be ordered alphabetically and contain no duplicates (each word should be included only once.)

* Ideally, the command should also be able to check against a second file and avoid repeating any words contained in that file. (So that I can create a file or words I already know and that are not repeated.) 

Is there a suitable command to do this?

Bronze (3 rep)

Oct 5, 2022, 10:39 AM • Last activity: Oct 5, 2022, 12:41 PM

4 votes

1 answers

866 views

Inverse grep does not find all matching lines

grep words

**EDIT:** I am making this too complicated. It's not about inverse grep. I get the same results using just `grep -x -f stop.txt < in.txt`. If `who` comes before `whose` in the stop word file, the result is just `who`. When the order is reversed in the stop word file, both lines in `in.txt` are found...

**EDIT:** I am making this too complicated. It's not about inverse grep. I get the same results using just grep -x -f stop.txt < in.txt. If who comes before whose in the stop word file, the result is just who. When the order is reversed in the stop word file, both lines in in.txt are found. I have the feeling that I fundamentally do not get grep. *** I cannot get inverse grep to work like I expect in order to remove lines containing a stop word from a file. The order in which the stop words are given affects the result. Suppose I have two files. An input file in.txt:

who
whose

And a file with a list of stop words stop.txt:

who
whose

If I "filter" in.txt with an inverse grep search on the stop words in `stop.txt`, I get:

$ grep -vx -f stop.txt < in.txt
whose
$

Only if I change stop.txt to

whose
who

do I get:

$ grep -vx -f stop.txt < in.txt
$

I do not understand why the order of words in the file with stop words is of importance.

Till A. Heilmann (141 rep)

Sep 9, 2022, 12:27 PM • Last activity: Sep 9, 2022, 09:25 PM

0 votes

0 answers

142 views

Terminal Emulator Blinking Words in VIM

bash terminal vim xterm words

I am starting to write LaTeX documents in VIM and here is my code: \documentclass{article} \begin{document} \frac{2x}{2} * \end{document} For some reason, the following characters are blinking in the terminal: - {article} - {document} - {}{} - {document} This forum seems to be related: https://super...

                                  I am starting to write LaTeX documents in VIM and here is my code:

    \documentclass{article}
    
    \begin{document}
    \frac{2x}{2} *
    \end{document}

For some reason, the following characters are blinking in the terminal:

 - {article}
 - {document}
 - {}{}
 - {document}

This forum seems to be related:

https://superuser.com/questions/449335/vi-editor-text-is-flashing-and-unusable/450302 

There was another post (EDIT: I found it: https://forums.fedoraforum.org/archive/index.php/t-291190.html)  from 2013 saying to do: export TERM=linux and I don't want that because text gets bolded and in vifm the selected file blinks.

I just want to turn blinking completely off - I can't see what I am typing when the screen is constantly blinking. It is very distracting.

 - OS: Arch Linux
 - Terminal emulator: xterm

                                

wgm (1 rep)

Oct 27, 2021, 03:37 PM • Last activity: Oct 28, 2021, 04:04 AM

-5 votes

1 answers

300 views

Who is responsible for /usr/share/dict/words ? (slurs found)

files filesystems words

I am writing an application that makes use of /usr/share/dict/words file to generate session keys. I was appalled to find that the most egregious of ethnic slurs - the 'N-word' (in different spellings) in the file. Who is responsible for the maintenance of this file, and why are these words in the f...

                                  I am writing an application that makes use of /usr/share/dict/words file to generate session keys.

I was appalled to find that the most egregious of ethnic slurs - the 'N-word' (in different spellings) in the file. 

Who is responsible for the maintenance of this file, and why are these words in the file?

LinGreenspan (1 rep)

Oct 16, 2021, 05:53 PM • Last activity: Oct 16, 2021, 06:12 PM

1 votes

2 answers

2191 views

crunch wordlist generation with all combinations

shell utilities backtrack truecrypt words

I'm trying to generate a wordlist in order to use it to bruteforce my own Truecrypt container. I do know parts of the password, its built up using blocks of other known passwords to increase length, but I forgot in which order the blocks were used and if some blocks weren't used at all. Example "blo...

                                  I'm trying to generate a wordlist in order to use it to bruteforce my own Truecrypt container. I do know parts of the password, its built up using blocks of other known passwords to increase length, but I forgot in which order the blocks were used and if some blocks weren't used at all.

Example "blocks" separated with space: dog cat bird xyz cow1 lion8
 
What I would like to do is create a wordlist containing each possible combination of these blocks. E.g

    dog
    cat
    dogcat
    catdog
    bird
    dogbird
    catbird
    birdcat
    birddog
    dogcatbird
    catdogbird
    xyz
    dogcatbirdxyz
    cow1
    xyzcow1dogcat
    xyzcow1dogcatbird
    catdogbirdxyzcow8
    lion8
    catdogbirdxyzcow1lion8
    lion8catdogbirdxyzcow1
    dogcatbirdxyzcow1lion8
    cow1birddogcatxyzlion8
    cow1lion8birddogcatxyz
    ...

So far I've tried to utilize a tool called crunch: http://www.irongeek.com/i.php?page=backtrack-r1-man-pages/crunch 

But the challenge seems to be how one should generate combinations of shorter combinations, not including all known blocks, example: dogcat only includes 2 blocks.

Perhaps someone know crunch better than me, or if I should use another tool or combination of tools?
                                

Niklas J. MacDowall (163 rep)

Apr 19, 2018, 06:36 AM • Last activity: Dec 28, 2020, 02:24 PM

1 votes

2 answers

503 views

Most Frequent Word in a Text

shell-script text-processing words

# Task The parameter here is a filename! The file contains a text. The task of the script is to decide which word is contained most frequently in other words. *** # Example Input And Output (e.g. the text is: play ball football basketball snowball - therefore ball is the winner because it is part of...

!/bin/sh
awk '{for(i=2;i


                          
                          
                        
                        
                        
                          
                            
                            user405815
                            
                          
                          
                            
                            Apr 12, 2020, 09:44 PM
                            
                              • Last activity: May 10, 2020, 10:09 AM



              
              
            
              
                
                  
                    
                      
                        
                          
                            1
                            votes
                          
                          
                            2
                            answers
                          
                          
                            1597
                            views
                          
                        
                      
                      
                        
                          
                            How to get a random adjective or noun?
                          
                          
                        
                        
                        
                          
                            
                              linux
                            
                          
                            
                              shell-script
                            
                          
                            
                              text-processing
                            
                          
                            
                              scripting
                            
                          
                            
                              words
                            
                          
                        

                        
                        
                          
                            I did find a list of words in `/usr/share/dict/words` but I don't know if there's a way (an already existing way?) to split them up into their corresponding part of speech? Alternatively, I'm fine with any other suggestions, `/usr/share/dict/words` was only the first list of words I found.
                          
                          
                          
                          
                            
                              
                                
                                  I did find a list of words in /usr/share/dict/words but I don't know if there's a way (an already existing way?) to split them up into their corresponding part of speech? Alternatively, I'm fine with any other suggestions, /usr/share/dict/words was only the first list of words I found.
                                
                              
                            
                          
                          
                          
                        
                        
                        
                          
                            
                            finefoot
                            (3554 rep)
                          
                          
                            
                            Nov 23, 2019, 10:09 AM
                            
                              • Last activity: Nov 24, 2019, 06:25 PM
                            
                          
                        
                      
                    
                  
                
              

              
              
                
                  
                
              
            
              
                
                  
                    
                      
                        
                          
                            0
                            votes
                          
                          
                            3
                            answers
                          
                          
                            247
                            views
                          
                        
                      
                      
                        
                          
                            extract specific words and its data from html/xml file
                          
                          
                        
                        
                        
                          
                            
                              text-processing
                            
                          
                            
                              xml
                            
                          
                            
                              html
                            
                          
                            
                              words
                            
                          
                        

                        
                        
                          
                            sample input is output should be tid="8390500116294391399" ts="N/A" ets="2019-02-22T00:21:41.228Z" trxn="smaple data with spaces 2 record" trxn="smaple data with spaces 3rd record" trxn="smaple data with spaces 5th record" tid="2345500116294391399" ts="NA" ets="2017-02-22T00:21:41.228Z" trxn="other...
                          
                          
                          
                          
                            
                              
                                
                                  sample input is 

    
        
            
                
                
                
                
                    
                
    
    
        
            
                
                
                
                
                    
                
    
    
        
            
                
                
                
                
                    
                
    

output should be

    tid="8390500116294391399"
    ts="N/A"
    ets="2019-02-22T00:21:41.228Z" 
    trxn="smaple data with spaces 2 record"
    trxn="smaple data with spaces 3rd record"
    trxn="smaple data with spaces 5th record"
    tid="2345500116294391399"
    ts="NA"
    ets="2017-02-22T00:21:41.228Z" 
    trxn="other data with spaces"
    trxn="another record data"
    trxn="smaple data with spaces record"
    trxn="data with spaces"
    tid="2345500116294391399"
    ts="NA"
    ets="2017-02-22T00:21:41.228Z"

I tried like below

    sed -e 's/trxn=/\ntrxn=/g' -e 's/tid=/\ntid=/g' -e 's/ts=/\nts=/g'

    while IFS= read -r var
    do
        if grep -Fxq "$trxn" temp2.txt
        then
          awk -F"=" '/tid/{print VAL=$i} /ts/{print VAL=$i} /ets/{print VAL=$i} /trxn/{print VAL=$i} /tid/{print VAL=$i;next}' temp2.txt >> out.txt
        else
          awk -F"=" '/tid/{print VAL=$i} /ts/{print VAL=$i} /ets/{print VAL=$i} /tid/{print VAL=$i;next}' temp2.txt >> out.txt
        fi
    done < "$input"
                                
                              
                            
                          
                          
                          
                        
                        
                        
                          
                            
                            BNRINBOX
                            (29 rep)
                          
                          
                            
                            Apr 15, 2019, 04:45 AM
                            
                              • Last activity: Sep 19, 2019, 01:19 AM
                            
                          
                        
                      
                    
                  
                
              

              
              
            
              
                
                  
                    
                      
                        
                          
                            0
                            votes
                          
                          
                            0
                            answers
                          
                          
                            93
                            views
                          
                        
                      
                      
                        
                          
                            Remove words from dictionary which has two or more letters at any plave in wor
                          
                          
                        
                        
                        
                          
                            
                              delete
                            
                          
                            
                              dictionary
                            
                          
                            
                              words
                            
                          
                        

                        
                        
                          
                            Basicaly I have dictionary from which I would like to remove words which contain two or more letters at any place in sentence for ex. bold leTTer reMoveMe quote lanGuaGe spaces Output should be like this bold quote space Is it possible to do it with seed ?
                          
                          
                          
                          
                            
                              
                                
                                  Basicaly I have dictionary from which I would like to remove words which contain two or more letters at any place in sentence for ex.

    bold
    leTTer
    reMoveMe
    quote
    lanGuaGe
    spaces

Output should be like this

    bold
    quote
    space

Is it possible to do it with seed ?
                                
                              
                            
                          
                          
                          
                        
                        
                        
                          
                            
                            Slobodan Vidovic
                            (185 rep)
                          
                          
                            
                            May 26, 2019, 10:13 PM
                            
                          
                        
                      
                    
                  
                
              

              
              
            
              
                
                  
                    
                      
                        
                          
                            3
                            votes
                          
                          
                            1
                            answers
                          
                          
                            8742
                            views
                          
                        
                      
                      
                        
                          
                            How to split files with numeric names?
                          
                          
                        
                        
                        
                          
                            
                              linux
                            
                          
                            
                              split
                            
                          
                            
                              words
                            
                          
                        

                        
                        
                          
                            I'm trying to split text file into files of 1024 lines, so I ran `split` with the `-d ` switch: split -d -l 300 ./list.lst I get some weird names: they start with `x` and the file names jump from `x89` to `x9000`. I want the files to be named like that: 1.lst 2.lst 3.lst ... Thanks.
                          
                          
                          
                          
                            
                              
                                
                                  I'm trying to split text file into files of 1024 lines, so I ran split with the -d  switch:

    split -d -l 300 ./list.lst
I get some weird names: they start with x and the file names jump from x89 to x9000. I want the files to be named like that:

    1.lst
    2.lst
    3.lst
    ...
Thanks.
                                
                              
                            
                          
                          
                          
                        
                        
                        
                          
                            
                            Adel M.
                            (358 rep)
                          
                          
                            
                            Jan 6, 2019, 01:01 AM
                            
                              • Last activity: Jan 6, 2019, 06:04 AM
                            
                          
                        
                      
                    
                  
                
              

              
              
                
                  
                
              
            
              
                
                  
                    
                      
                        
                          
                            0
                            votes
                          
                          
                            4
                            answers
                          
                          
                            4877
                            views
                          
                        
                      
                      
                        
                          
                            finding words that contain only 3 characters using sed in a file
                          
                          
                        
                        
                        
                          
                            
                              sed
                            
                          
                            
                              words
                            
                          
                        

                        
                        
                          
                            I need to print only words that consist of 3 characters, however the word document is a numbered list. Here's the exact question that I have to answer: > Using the `sed` command with `[[:lower:]]` character class on the `animals` file, find all the animal names that are only three characters long _(...
                          
                          
                          
                          
                            
                              
                                
                                  I need to print only words that consist of 3 characters, however the word document is a numbered list.

Here's the exact question that I have to answer:

> Using the sed command with [[:lower:]] character class on the animals file, find all the animal names that are only three characters long _(3 marks)_.


This is what i have tried:

    cat animals | sed '/{[:lower:]].../d'  
    cat animals | sed '/{[:lower:]]/d' 
    sed '/[[:lower:]]{3}/d' animals


This is the file I am trying to find the words from (the animals file):

    01. aardvark
    02. badger
    03. cow
    04. dog
    05. elephant
    06. fox
    07. goose
    08. horse
    09. iguana
    10. jackal
    11. koala
    12. lamb
    13. mongoose
    14. narwhal
    15. onyx
    16. pig
    17. quail
    18. rat
    19. snake
    20. tiger
    21. umbrellabird
    22. vulture
    23. walrus
    24. xerus
    25. yak
    26. zebra



 i have just found out the code cannot have the [[:lower:]] in it more than once is there a way to do this??

                                
                              
                            
                          
                          
                          
                        
                        
                        
                          
                            
                            ItCouldBeMe
                            (53 rep)
                          
                          
                            
                            Oct 20, 2018, 07:57 PM
                            
                              • Last activity: Oct 24, 2018, 02:36 AM
                            
                          
                        
                      
                    
                  
                
              

              
              
            
              
                
                  
                    
                      
                        
                          
                            3
                            votes
                          
                          
                            4
                            answers
                          
                          
                            2014
                            views
                          
                        
                      
                      
                        
                          
                            sort letters in a single word - use it to find permutations (or anagrams)
                          
                          
                        
                        
                        
                          
                            
                              sort
                            
                          
                            
                              words
                            
                          
                        

                        
                        
                          
                            I have some dictionary to myspell in `file.dic`. Let's say: abc aword bword cab worda wordzzz and I'm looking for different words that are **permutations (or anagrams)** of each other. If there was a command "letter-sort" I'd do it more or less like that: cat file.dic | letter-sort | paste - file.di...
                          
                          
                          
                          
                            
                              
                                
                                  I have some dictionary to myspell in file.dic. Let's say:

    abc
    aword
    bword
    cab
    worda
    wordzzz

and I'm looking for different words that are **permutations (or anagrams)** of each other.

If there was a command "letter-sort" I'd do it more or less like that:

    cat file.dic | letter-sort | paste - file.dic | sort

That gives me:

    abc abc
    abc cab
    adorw aword
    adorw worda
    bdorw bword    
    dorwzzz wordzzz

so now I clearly see anagrams in file. Is there such letters-sort command or how to obtain such result in maybe some other way?
                                
                              
                            
                          
                          
                          
                        
                        
                        
                          
                            
                            sZpak
                            (511 rep)
                          
                          
                            
                            Nov 17, 2016, 11:27 PM
                            
                              • Last activity: Aug 29, 2017, 12:53 PM
                            
                          
                        
                      
                    
                  
                
              

              
              
            
              
                
                  
                    
                      
                        
                          
                            1
                            votes
                          
                          
                            1
                            answers
                          
                          
                            2009
                            views
                          
                        
                      
                      
                        
                          
                            How do I use this regex with grep?
                          
                          
                        
                        
                        
                          
                            
                              grep
                            
                          
                            
                              regular-expression
                            
                          
                            
                              words
                            
                          
                        

                        
                        
                          
                            I'm new to regex and found a command on a regex tutorial/test site that will allow me to search for 3 consecutive consonants. The only problem is I can't figure out how to use it with grep. Would someone help me out? I'm trying to search a word list text file using: `(?:([bcdfghjklmnpqrstvwxzy])(?!....
                          
                          
                          
                          
                            
                              
                                
                                  I'm new to regex and found a command on a regex tutorial/test site that will allow me to search for 3 consecutive consonants. The only problem is I can't figure out how to use it with grep. Would someone help me out? I'm trying to search a word list text file using: 

(?:([bcdfghjklmnpqrstvwxzy])(?!.{1,2}\1)){3}
                                
                              
                            
                          
                          
                          
                        
                        
                        
                          
                            
                            Austin
                            (231 rep)
                          
                          
                            
                            Oct 31, 2016, 05:53 AM
                            
                              • Last activity: Oct 31, 2016, 06:03 AM
                            
                          
                        
                      
                    
                  
                
              

              
              
                
                  
                
              
            
              
                
                  
                    
                      
                        
                          
                            4
                            votes
                          
                          
                            2
                            answers
                          
                          
                            881
                            views
                          
                        
                      
                      
                        
                          
                            Does command substitution within arithmetic substitution get word split?
                          
                          
                        
                        
                        
                          
                            
                              bash
                            
                          
                            
                              arithmetic
                            
                          
                            
                              words
                            
                          
                        

                        
                        
                          
                            I seem to recall from comments on this site that the contents of arithmetic expansion **may** be word split, but I can't find the comment again. Consider the following code: printf '%d\n' "$(($(sed -n '/my regex/{=;q;}' myfile)-1))" If the `sed` command outputs a multi-digit number and `$IFS` contai...
                          
                          
                          
                          
                            
                              
                                
                                  I seem to recall from comments on this site that the contents of arithmetic expansion **may** be word split, but I can't find the comment again.

Consider the following code:

    printf '%d\n' "$(($(sed -n '/my regex/{=;q;}' myfile)-1))"

If the sed command outputs a multi-digit number and $IFS contains digits, will the command substitution get word split before the arithmetic occurs?

(I've already tested using extra double quotes:

    printf '%d\n' "$(("$(sed -n '/my regex/{=;q;}' myfile)"-1))"

and this doesn't work.)

-----

Incidentally the example code above is a reduced-to-simplest-form alteration of [this function](https://stackoverflow.com/a/35882480/5419599)  that I just posted on Stack Overflow.
                                
                              
                            
                          
                          
                          
                        
                        
                        
                          
                            
                            Wildcard
                            (37446 rep)
                          
                          
                            
                            Mar 9, 2016, 03:52 AM
                            
                              • Last activity: Mar 11, 2016, 01:46 AM


          
          

          
          
            
              
                
                  
                    Previous
                  
                
                
                
                  Page 1
                
                
                
                  
                    Next
                  
                
              
            
            
            
              
              Showing page 1 of 19 total questions