Unix & Linux Stack Exchange
Q&A for users of Linux, FreeBSD and other Unix-like operating systems
Latest Questions
32
votes
2
answers
37403
views
Where do the words in /usr/share/dict/words come from?
[`/usr/share/dict/words`](https://en.wikipedia.org/wiki/Words_(Unix)) contains lots of words. How is this list generated? Are its contents the same across different Unices? Is there any standard dictating what it must contain? All I've been able to turn up so far is that on Ubuntu/Debian the list co...
[
/usr/share/dict/words
](https://en.wikipedia.org/wiki/Words_(Unix)) contains lots of words. How is this list generated? Are its contents the same across different Unices? Is there any standard dictating what it must contain?
All I've been able to turn up so far is that on Ubuntu/Debian the list comes from the [wordlist](https://packages.debian.org/sid/wordlist) packages, but their descriptions offer no clue on how the lists were actually generated.
Mark Amery
(3220 rep)
Jul 2, 2015, 07:23 PM
• Last activity: Jul 29, 2025, 09:25 PM
0
votes
0
answers
40
views
How to convert list of words separate by comma to column?
How to convert list of words separate by comma to column? I have this "word, other word, another one" I want word other word another one How to do?
How to convert list of words separate by comma to column?
I have this
"word, other word, another one"
I want
word
other word
another one
How to do?
elbarna
(13690 rep)
Feb 25, 2025, 08:44 PM
• Last activity: Feb 25, 2025, 08:48 PM
0
votes
1
answers
166
views
Does /usr/share/dict/words contain personal information?
I am considering including a copy of my `/usr/share/dict/words` file in a public GitHub repository for a project that requires dictionaries. Is this a bad idea, and if so, why? I'm particularly interested in the privacy/security (or even legal?) aspects. Are there common programs that add words to t...
I am considering including a copy of my
/usr/share/dict/words
file in a public GitHub repository for a project that requires dictionaries. Is this a bad idea, and if so, why?
I'm particularly interested in the privacy/security (or even legal?) aspects. Are there common programs that add words to this dictionary, for example if I choose "Add to Dictionary" in a spell checker? Is the file likely to contain any sensitive information, such as my username (I checked that, and it doesn't, but there could be similar things I didn't think to check). It'd be impractical to look through all 104,334 words. Perhaps it's just the usr
in the path making me unnecessarily concerned.
I've read over these questions about where the words come from. However, is it probable that any words have since been added or removed?
I suppose if nothing has changed, I could just get the source. But if some programs added helpful (non-personal) words, I'd want to keep those.
In case it's important, I am running Ubuntu 23.10. But I'd prefer a slightly more general answer, if possible.
### Note
I am fully aware that
- it would be possible to point to the file path in code rather than "hard coding" it into the repo, and
- this may not be the best free English word list .
However, I'm not interested in using a different list *instead* of this one (in such a case, I'd rather just use both). And if I use a list, it's necessary that I can include the actual file.
kviLL
(103 rep)
Jun 3, 2024, 08:20 PM
• Last activity: Jun 4, 2024, 09:14 AM
3
votes
2
answers
1119
views
I want to find lines where a specific word appears in a file along with line number and take the line numbers in an array .How to do that in bash?
This returns line numbers but they are in a string: ``` grep -n -F -w $word $file | cut -d : -f 1 ```
This returns line numbers but they are in a string:
grep -n -F -w $word $file | cut -d : -f 1
Arpan Koley
(31 rep)
Aug 24, 2023, 05:01 AM
• Last activity: Aug 26, 2023, 04:11 AM
25
votes
17
answers
14011
views
Bash script: split word on each letter
How can I split a word's letters, with each letter in a separate line? For example, given `"StackOver"` I would like to see S t a c k O v e r I'm new to bash so I have no clue where to start.
How can I split a word's letters, with each letter in a separate line?
For example, given
"StackOver"
I would like to see
S
t
a
c
k
O
v
e
r
I'm new to bash so I have no clue where to start.
Sijaan Hallak
(529 rep)
Jan 4, 2016, 11:41 PM
• Last activity: May 1, 2023, 12:43 PM
0
votes
1
answers
713
views
Creating word list from document
I'm trying to find an efficient way of learning vocabulary in new languages. I'd like to be able to create word lists using files that contain books. I'm new to Linux, any help much appreciated. I would like to: * have a command that will take as input a text file (txt format for example) and output...
I'm trying to find an efficient way of learning vocabulary in new languages. I'd like to be able to create word lists using files that contain books.
I'm new to Linux, any help much appreciated.
I would like to:
* have a command that will take as input a text file (txt format for example) and output another file that contains a list of all individual words in the first file.
* The new file should be ordered alphabetically and contain no duplicates (each word should be included only once.)
* Ideally, the command should also be able to check against a second file and avoid repeating any words contained in that file. (So that I can create a file or words I already know and that are not repeated.)
Is there a suitable command to do this?
Bronze
(3 rep)
Oct 5, 2022, 10:39 AM
• Last activity: Oct 5, 2022, 12:41 PM
4
votes
1
answers
866
views
Inverse grep does not find all matching lines
**EDIT:** I am making this too complicated. It's not about inverse grep. I get the same results using just `grep -x -f stop.txt < in.txt`. If `who` comes before `whose` in the stop word file, the result is just `who`. When the order is reversed in the stop word file, both lines in `in.txt` are found...
**EDIT:** I am making this too complicated. It's not about inverse grep. I get the same results using just
grep -x -f stop.txt < in.txt
. If who
comes before whose
in the stop word file, the result is just who
. When the order is reversed in the stop word file, both lines in in.txt
are found. I have the feeling that I fundamentally do not get grep.
***
I cannot get inverse grep to work like I expect in order to remove lines containing a stop word from a file. The order in which the stop words are given affects the result.
Suppose I have two files. An input file in.txt
:
who
whose
And a file with a list of stop words stop.txt
:
who
whose
If I "filter" in.txt
with an inverse grep search on the stop words in `stop.txt
`, I get:
$ grep -vx -f stop.txt < in.txt
whose
$
Only if I change stop.txt
to
whose
who
do I get:
$ grep -vx -f stop.txt < in.txt
$
I do not understand why the order of words in the file with stop words is of importance.
Till A. Heilmann
(141 rep)
Sep 9, 2022, 12:27 PM
• Last activity: Sep 9, 2022, 09:25 PM
0
votes
0
answers
142
views
Terminal Emulator Blinking Words in VIM
I am starting to write LaTeX documents in VIM and here is my code: \documentclass{article} \begin{document} \frac{2x}{2} * \end{document} For some reason, the following characters are blinking in the terminal: - {article} - {document} - {}{} - {document} This forum seems to be related: https://super...
I am starting to write LaTeX documents in VIM and here is my code:
\documentclass{article}
\begin{document}
\frac{2x}{2} *
\end{document}
For some reason, the following characters are blinking in the terminal:
- {article}
- {document}
- {}{}
- {document}
This forum seems to be related:
https://superuser.com/questions/449335/vi-editor-text-is-flashing-and-unusable/450302
There was another post (EDIT: I found it: https://forums.fedoraforum.org/archive/index.php/t-291190.html) from 2013 saying to do:
export TERM=linux
and I don't want that because text gets bolded and in vifm the selected file blinks.
I just want to turn blinking completely off - I can't see what I am typing when the screen is constantly blinking. It is very distracting.
- OS: Arch Linux
- Terminal emulator: xterm
wgm
(1 rep)
Oct 27, 2021, 03:37 PM
• Last activity: Oct 28, 2021, 04:04 AM
-5
votes
1
answers
300
views
Who is responsible for /usr/share/dict/words ? (slurs found)
I am writing an application that makes use of /usr/share/dict/words file to generate session keys. I was appalled to find that the most egregious of ethnic slurs - the 'N-word' (in different spellings) in the file. Who is responsible for the maintenance of this file, and why are these words in the f...
I am writing an application that makes use of /usr/share/dict/words file to generate session keys.
I was appalled to find that the most egregious of ethnic slurs - the 'N-word' (in different spellings) in the file.
Who is responsible for the maintenance of this file, and why are these words in the file?
LinGreenspan
(1 rep)
Oct 16, 2021, 05:53 PM
• Last activity: Oct 16, 2021, 06:12 PM
1
votes
2
answers
2191
views
crunch wordlist generation with all combinations
I'm trying to generate a wordlist in order to use it to bruteforce my own Truecrypt container. I do know parts of the password, its built up using blocks of other known passwords to increase length, but I forgot in which order the blocks were used and if some blocks weren't used at all. Example "blo...
I'm trying to generate a wordlist in order to use it to bruteforce my own Truecrypt container. I do know parts of the password, its built up using blocks of other known passwords to increase length, but I forgot in which order the blocks were used and if some blocks weren't used at all.
Example "blocks" separated with space:
dog cat bird xyz cow1 lion8
What I would like to do is create a wordlist containing each possible combination of these blocks. E.g
dog
cat
dogcat
catdog
bird
dogbird
catbird
birdcat
birddog
dogcatbird
catdogbird
xyz
dogcatbirdxyz
cow1
xyzcow1dogcat
xyzcow1dogcatbird
catdogbirdxyzcow8
lion8
catdogbirdxyzcow1lion8
lion8catdogbirdxyzcow1
dogcatbirdxyzcow1lion8
cow1birddogcatxyzlion8
cow1lion8birddogcatxyz
...
So far I've tried to utilize a tool called crunch: http://www.irongeek.com/i.php?page=backtrack-r1-man-pages/crunch
But the challenge seems to be how one should generate combinations of shorter combinations, not including all known blocks, example: dogcat
only includes 2 blocks.
Perhaps someone know crunch
better than me, or if I should use another tool or combination of tools?
Niklas J. MacDowall
(163 rep)
Apr 19, 2018, 06:36 AM
• Last activity: Dec 28, 2020, 02:24 PM
1
votes
2
answers
503
views
Most Frequent Word in a Text
# Task The parameter here is a filename! The file contains a text. The task of the script is to decide which word is contained most frequently in other words. *** # Example Input And Output (e.g. the text is: play ball football basketball snowball - therefore ball is the winner because it is part of...
# Task
The parameter here is a filename! The file contains a text. The task of the script is to decide which word is contained most frequently in other words.
***
# Example Input And Output
(e.g. the text is: play ball football basketball snowball - therefore ball is the winner because it is part of three other worlds).
***
# My code sofar
I did this code so far, but it doesn't work for every output
!/bin/sh
awk '{for(i=2;i
user405815
Apr 12, 2020, 09:44 PM
• Last activity: May 10, 2020, 10:09 AM
1
votes
2
answers
1597
views
How to get a random adjective or noun?
I did find a list of words in `/usr/share/dict/words` but I don't know if there's a way (an already existing way?) to split them up into their corresponding part of speech? Alternatively, I'm fine with any other suggestions, `/usr/share/dict/words` was only the first list of words I found.
I did find a list of words in
/usr/share/dict/words
but I don't know if there's a way (an already existing way?) to split them up into their corresponding part of speech? Alternatively, I'm fine with any other suggestions, /usr/share/dict/words
was only the first list of words I found.
finefoot
(3554 rep)
Nov 23, 2019, 10:09 AM
• Last activity: Nov 24, 2019, 06:25 PM
0
votes
3
answers
247
views
extract specific words and its data from html/xml file
sample input is output should be tid="8390500116294391399" ts="N/A" ets="2019-02-22T00:21:41.228Z" trxn="smaple data with spaces 2 record" trxn="smaple data with spaces 3rd record" trxn="smaple data with spaces 5th record" tid="2345500116294391399" ts="NA" ets="2017-02-22T00:21:41.228Z" trxn="other...
sample input is
output should be
tid="8390500116294391399"
ts="N/A"
ets="2019-02-22T00:21:41.228Z"
trxn="smaple data with spaces 2 record"
trxn="smaple data with spaces 3rd record"
trxn="smaple data with spaces 5th record"
tid="2345500116294391399"
ts="NA"
ets="2017-02-22T00:21:41.228Z"
trxn="other data with spaces"
trxn="another record data"
trxn="smaple data with spaces record"
trxn="data with spaces"
tid="2345500116294391399"
ts="NA"
ets="2017-02-22T00:21:41.228Z"
I tried like below
sed -e 's/trxn=/\ntrxn=/g' -e 's/tid=/\ntid=/g' -e 's/ts=/\nts=/g'
while IFS= read -r var
do
if grep -Fxq "$trxn" temp2.txt
then
awk -F"=" '/tid/{print VAL=$i} /ts/{print VAL=$i} /ets/{print VAL=$i} /trxn/{print VAL=$i} /tid/{print VAL=$i;next}' temp2.txt >> out.txt
else
awk -F"=" '/tid/{print VAL=$i} /ts/{print VAL=$i} /ets/{print VAL=$i} /tid/{print VAL=$i;next}' temp2.txt >> out.txt
fi
done < "$input"
BNRINBOX
(29 rep)
Apr 15, 2019, 04:45 AM
• Last activity: Sep 19, 2019, 01:19 AM
0
votes
0
answers
93
views
Remove words from dictionary which has two or more letters at any plave in wor
Basicaly I have dictionary from which I would like to remove words which contain two or more letters at any place in sentence for ex. bold leTTer reMoveMe quote lanGuaGe spaces Output should be like this bold quote space Is it possible to do it with seed ?
Basicaly I have dictionary from which I would like to remove words which contain two or more letters at any place in sentence for ex.
bold
leTTer
reMoveMe
quote
lanGuaGe
spaces
Output should be like this
bold
quote
space
Is it possible to do it with seed ?
Slobodan Vidovic
(185 rep)
May 26, 2019, 10:13 PM
3
votes
1
answers
8742
views
How to split files with numeric names?
I'm trying to split text file into files of 1024 lines, so I ran `split` with the `-d ` switch: split -d -l 300 ./list.lst I get some weird names: they start with `x` and the file names jump from `x89` to `x9000`. I want the files to be named like that: 1.lst 2.lst 3.lst ... Thanks.
I'm trying to split text file into files of 1024 lines, so I ran
split
with the -d
switch:
split -d -l 300 ./list.lst
I get some weird names: they start with x
and the file names jump from x89
to x9000
. I want the files to be named like that:
1.lst
2.lst
3.lst
...
Thanks.
Adel M.
(358 rep)
Jan 6, 2019, 01:01 AM
• Last activity: Jan 6, 2019, 06:04 AM
0
votes
4
answers
4877
views
finding words that contain only 3 characters using sed in a file
I need to print only words that consist of 3 characters, however the word document is a numbered list. Here's the exact question that I have to answer: > Using the `sed` command with `[[:lower:]]` character class on the `animals` file, find all the animal names that are only three characters long _(...
I need to print only words that consist of 3 characters, however the word document is a numbered list.
Here's the exact question that I have to answer:
> Using the
sed
command with [[:lower:]]
character class on the animals
file, find all the animal names that are only three characters long _(3 marks)_.
This is what i have tried:
cat animals | sed '/{[:lower:]].../d'
cat animals | sed '/{[:lower:]]/d'
sed '/[[:lower:]]{3}/d' animals
This is the file I am trying to find the words from (the animals
file):
01. aardvark
02. badger
03. cow
04. dog
05. elephant
06. fox
07. goose
08. horse
09. iguana
10. jackal
11. koala
12. lamb
13. mongoose
14. narwhal
15. onyx
16. pig
17. quail
18. rat
19. snake
20. tiger
21. umbrellabird
22. vulture
23. walrus
24. xerus
25. yak
26. zebra
i have just found out the code cannot have the [[:lower:]] in it more than once is there a way to do this??
ItCouldBeMe
(53 rep)
Oct 20, 2018, 07:57 PM
• Last activity: Oct 24, 2018, 02:36 AM
3
votes
4
answers
2014
views
sort letters in a single word - use it to find permutations (or anagrams)
I have some dictionary to myspell in `file.dic`. Let's say: abc aword bword cab worda wordzzz and I'm looking for different words that are **permutations (or anagrams)** of each other. If there was a command "letter-sort" I'd do it more or less like that: cat file.dic | letter-sort | paste - file.di...
I have some dictionary to myspell in
file.dic
. Let's say:
abc
aword
bword
cab
worda
wordzzz
and I'm looking for different words that are **permutations (or anagrams)** of each other.
If there was a command "letter-sort" I'd do it more or less like that:
cat file.dic | letter-sort | paste - file.dic | sort
That gives me:
abc abc
abc cab
adorw aword
adorw worda
bdorw bword
dorwzzz wordzzz
so now I clearly see anagrams in file. Is there such letters-sort
command or how to obtain such result in maybe some other way?
sZpak
(511 rep)
Nov 17, 2016, 11:27 PM
• Last activity: Aug 29, 2017, 12:53 PM
1
votes
1
answers
2009
views
How do I use this regex with grep?
I'm new to regex and found a command on a regex tutorial/test site that will allow me to search for 3 consecutive consonants. The only problem is I can't figure out how to use it with grep. Would someone help me out? I'm trying to search a word list text file using: `(?:([bcdfghjklmnpqrstvwxzy])(?!....
I'm new to regex and found a command on a regex tutorial/test site that will allow me to search for 3 consecutive consonants. The only problem is I can't figure out how to use it with grep. Would someone help me out? I'm trying to search a word list text file using:
(?:([bcdfghjklmnpqrstvwxzy])(?!.{1,2}\1)){3}
Austin
(231 rep)
Oct 31, 2016, 05:53 AM
• Last activity: Oct 31, 2016, 06:03 AM
4
votes
2
answers
881
views
Does command substitution within arithmetic substitution get word split?
I seem to recall from comments on this site that the contents of arithmetic expansion **may** be word split, but I can't find the comment again. Consider the following code: printf '%d\n' "$(($(sed -n '/my regex/{=;q;}' myfile)-1))" If the `sed` command outputs a multi-digit number and `$IFS` contai...
I seem to recall from comments on this site that the contents of arithmetic expansion **may** be word split, but I can't find the comment again.
Consider the following code:
printf '%d\n' "$(($(sed -n '/my regex/{=;q;}' myfile)-1))"
If the
sed
command outputs a multi-digit number and $IFS
contains digits, will the command substitution get word split before the arithmetic occurs?
(I've already tested using extra double quotes:
printf '%d\n' "$(("$(sed -n '/my regex/{=;q;}' myfile)"-1))"
and this doesn't work.)
-----
Incidentally the example code above is a reduced-to-simplest-form alteration of [this function](https://stackoverflow.com/a/35882480/5419599) that I just posted on Stack Overflow.
Wildcard
(37446 rep)
Mar 9, 2016, 03:52 AM
• Last activity: Mar 11, 2016, 01:46 AM
Showing page 1 of 19 total questions