Sample Header Ad - 728x90

Unix & Linux Stack Exchange

Q&A for users of Linux, FreeBSD and other Unix-like operating systems

Latest Questions

32 votes
2 answers
37403 views
Where do the words in /usr/share/dict/words come from?
[`/usr/share/dict/words`](https://en.wikipedia.org/wiki/Words_(Unix)) contains lots of words. How is this list generated? Are its contents the same across different Unices? Is there any standard dictating what it must contain? All I've been able to turn up so far is that on Ubuntu/Debian the list co...
[/usr/share/dict/words](https://en.wikipedia.org/wiki/Words_(Unix)) contains lots of words. How is this list generated? Are its contents the same across different Unices? Is there any standard dictating what it must contain? All I've been able to turn up so far is that on Ubuntu/Debian the list comes from the [wordlist](https://packages.debian.org/sid/wordlist) packages, but their descriptions offer no clue on how the lists were actually generated.
Mark Amery (3220 rep)
Jul 2, 2015, 07:23 PM • Last activity: Jul 29, 2025, 09:25 PM
0 votes
0 answers
40 views
How to convert list of words separate by comma to column?
How to convert list of words separate by comma to column? I have this "word, other word, another one" I want word other word another one How to do?
How to convert list of words separate by comma to column? I have this "word, other word, another one" I want word other word another one How to do?
elbarna (13690 rep)
Feb 25, 2025, 08:44 PM • Last activity: Feb 25, 2025, 08:48 PM
0 votes
1 answers
166 views
Does /usr/share/dict/words contain personal information?
I am considering including a copy of my `/usr/share/dict/words` file in a public GitHub repository for a project that requires dictionaries. Is this a bad idea, and if so, why? I'm particularly interested in the privacy/security (or even legal?) aspects. Are there common programs that add words to t...
I am considering including a copy of my /usr/share/dict/words file in a public GitHub repository for a project that requires dictionaries. Is this a bad idea, and if so, why? I'm particularly interested in the privacy/security (or even legal?) aspects. Are there common programs that add words to this dictionary, for example if I choose "Add to Dictionary" in a spell checker? Is the file likely to contain any sensitive information, such as my username (I checked that, and it doesn't, but there could be similar things I didn't think to check). It'd be impractical to look through all 104,334 words. Perhaps it's just the usr in the path making me unnecessarily concerned. I've read over these questions about where the words come from. However, is it probable that any words have since been added or removed? I suppose if nothing has changed, I could just get the source. But if some programs added helpful (non-personal) words, I'd want to keep those. In case it's important, I am running Ubuntu 23.10. But I'd prefer a slightly more general answer, if possible. ### Note I am fully aware that - it would be possible to point to the file path in code rather than "hard coding" it into the repo, and - this may not be the best free English word list . However, I'm not interested in using a different list *instead* of this one (in such a case, I'd rather just use both). And if I use a list, it's necessary that I can include the actual file.
kviLL (103 rep)
Jun 3, 2024, 08:20 PM • Last activity: Jun 4, 2024, 09:14 AM
3 votes
2 answers
1119 views
I want to find lines where a specific word appears in a file along with line number and take the line numbers in an array .How to do that in bash?
This returns line numbers but they are in a string: ``` grep -n -F -w $word $file | cut -d : -f 1 ```
This returns line numbers but they are in a string:
grep -n -F -w $word $file | cut -d : -f 1
Arpan Koley (31 rep)
Aug 24, 2023, 05:01 AM • Last activity: Aug 26, 2023, 04:11 AM
25 votes
17 answers
14011 views
Bash script: split word on each letter
How can I split a word's letters, with each letter in a separate line? For example, given `"StackOver"` I would like to see S t a c k O v e r I'm new to bash so I have no clue where to start.
How can I split a word's letters, with each letter in a separate line? For example, given "StackOver" I would like to see S t a c k O v e r I'm new to bash so I have no clue where to start.
Sijaan Hallak (529 rep)
Jan 4, 2016, 11:41 PM • Last activity: May 1, 2023, 12:43 PM
0 votes
1 answers
713 views
Creating word list from document
I'm trying to find an efficient way of learning vocabulary in new languages. I'd like to be able to create word lists using files that contain books. I'm new to Linux, any help much appreciated. I would like to: * have a command that will take as input a text file (txt format for example) and output...
I'm trying to find an efficient way of learning vocabulary in new languages. I'd like to be able to create word lists using files that contain books. I'm new to Linux, any help much appreciated. I would like to: * have a command that will take as input a text file (txt format for example) and output another file that contains a list of all individual words in the first file. * The new file should be ordered alphabetically and contain no duplicates (each word should be included only once.) * Ideally, the command should also be able to check against a second file and avoid repeating any words contained in that file. (So that I can create a file or words I already know and that are not repeated.) Is there a suitable command to do this?
Bronze (3 rep)
Oct 5, 2022, 10:39 AM • Last activity: Oct 5, 2022, 12:41 PM
4 votes
1 answers
866 views
Inverse grep does not find all matching lines
**EDIT:** I am making this too complicated. It's not about inverse grep. I get the same results using just `grep -x -f stop.txt < in.txt`. If `who` comes before `whose` in the stop word file, the result is just `who`. When the order is reversed in the stop word file, both lines in `in.txt` are found...
**EDIT:** I am making this too complicated. It's not about inverse grep. I get the same results using just grep -x -f stop.txt < in.txt. If who comes before whose in the stop word file, the result is just who. When the order is reversed in the stop word file, both lines in in.txt are found. I have the feeling that I fundamentally do not get grep. *** I cannot get inverse grep to work like I expect in order to remove lines containing a stop word from a file. The order in which the stop words are given affects the result. Suppose I have two files. An input file in.txt:
who
whose
And a file with a list of stop words stop.txt:
who
whose
If I "filter" in.txt with an inverse grep search on the stop words in `stop.txt`, I get:
$ grep -vx -f stop.txt < in.txt
whose
$
Only if I change stop.txt to
whose
who
do I get:
$ grep -vx -f stop.txt < in.txt
$
I do not understand why the order of words in the file with stop words is of importance.
Till A. Heilmann (141 rep)
Sep 9, 2022, 12:27 PM • Last activity: Sep 9, 2022, 09:25 PM
0 votes
0 answers
142 views
Terminal Emulator Blinking Words in VIM
I am starting to write LaTeX documents in VIM and here is my code: \documentclass{article} \begin{document} \frac{2x}{2} * \end{document} For some reason, the following characters are blinking in the terminal: - {article} - {document} - {}{} - {document} This forum seems to be related: https://super...
I am starting to write LaTeX documents in VIM and here is my code: \documentclass{article} \begin{document} \frac{2x}{2} * \end{document} For some reason, the following characters are blinking in the terminal: - {article} - {document} - {}{} - {document} This forum seems to be related: https://superuser.com/questions/449335/vi-editor-text-is-flashing-and-unusable/450302 There was another post (EDIT: I found it: https://forums.fedoraforum.org/archive/index.php/t-291190.html) from 2013 saying to do: export TERM=linux and I don't want that because text gets bolded and in vifm the selected file blinks. I just want to turn blinking completely off - I can't see what I am typing when the screen is constantly blinking. It is very distracting. - OS: Arch Linux - Terminal emulator: xterm
wgm (1 rep)
Oct 27, 2021, 03:37 PM • Last activity: Oct 28, 2021, 04:04 AM
-5 votes
1 answers
300 views
Who is responsible for /usr/share/dict/words ? (slurs found)
I am writing an application that makes use of /usr/share/dict/words file to generate session keys. I was appalled to find that the most egregious of ethnic slurs - the 'N-word' (in different spellings) in the file. Who is responsible for the maintenance of this file, and why are these words in the f...
I am writing an application that makes use of /usr/share/dict/words file to generate session keys. I was appalled to find that the most egregious of ethnic slurs - the 'N-word' (in different spellings) in the file. Who is responsible for the maintenance of this file, and why are these words in the file?
LinGreenspan (1 rep)
Oct 16, 2021, 05:53 PM • Last activity: Oct 16, 2021, 06:12 PM
1 votes
2 answers
2191 views
crunch wordlist generation with all combinations
I'm trying to generate a wordlist in order to use it to bruteforce my own Truecrypt container. I do know parts of the password, its built up using blocks of other known passwords to increase length, but I forgot in which order the blocks were used and if some blocks weren't used at all. Example "blo...
I'm trying to generate a wordlist in order to use it to bruteforce my own Truecrypt container. I do know parts of the password, its built up using blocks of other known passwords to increase length, but I forgot in which order the blocks were used and if some blocks weren't used at all. Example "blocks" separated with space: dog cat bird xyz cow1 lion8 What I would like to do is create a wordlist containing each possible combination of these blocks. E.g dog cat dogcat catdog bird dogbird catbird birdcat birddog dogcatbird catdogbird xyz dogcatbirdxyz cow1 xyzcow1dogcat xyzcow1dogcatbird catdogbirdxyzcow8 lion8 catdogbirdxyzcow1lion8 lion8catdogbirdxyzcow1 dogcatbirdxyzcow1lion8 cow1birddogcatxyzlion8 cow1lion8birddogcatxyz ... So far I've tried to utilize a tool called crunch: http://www.irongeek.com/i.php?page=backtrack-r1-man-pages/crunch But the challenge seems to be how one should generate combinations of shorter combinations, not including all known blocks, example: dogcat only includes 2 blocks. Perhaps someone know crunch better than me, or if I should use another tool or combination of tools?
Niklas J. MacDowall (163 rep)
Apr 19, 2018, 06:36 AM • Last activity: Dec 28, 2020, 02:24 PM
1 votes
2 answers
503 views
Most Frequent Word in a Text
# Task The parameter here is a filename! The file contains a text. The task of the script is to decide which word is contained most frequently in other words. *** # Example Input And Output (e.g. the text is: play ball football basketball snowball - therefore ball is the winner because it is part of...
# Task The parameter here is a filename! The file contains a text. The task of the script is to decide which word is contained most frequently in other words. *** # Example Input And Output (e.g. the text is: play ball football basketball snowball - therefore ball is the winner because it is part of three other worlds). *** # My code sofar I did this code so far, but it doesn't work for every output
!/bin/sh
awk '{for(i=2;i
user405815
Apr 12, 2020, 09:44 PM • Last activity: May 10, 2020, 10:09 AM
1 votes
2 answers
1597 views
How to get a random adjective or noun?
I did find a list of words in `/usr/share/dict/words` but I don't know if there's a way (an already existing way?) to split them up into their corresponding part of speech? Alternatively, I'm fine with any other suggestions, `/usr/share/dict/words` was only the first list of words I found.
I did find a list of words in /usr/share/dict/words but I don't know if there's a way (an already existing way?) to split them up into their corresponding part of speech? Alternatively, I'm fine with any other suggestions, /usr/share/dict/words was only the first list of words I found.
finefoot (3554 rep)
Nov 23, 2019, 10:09 AM • Last activity: Nov 24, 2019, 06:25 PM
0 votes
3 answers
247 views
extract specific words and its data from html/xml file
sample input is output should be tid="8390500116294391399" ts="N/A" ets="2019-02-22T00:21:41.228Z" trxn="smaple data with spaces 2 record" trxn="smaple data with spaces 3rd record" trxn="smaple data with spaces 5th record" tid="2345500116294391399" ts="NA" ets="2017-02-22T00:21:41.228Z" trxn="other...
sample input is output should be tid="8390500116294391399" ts="N/A" ets="2019-02-22T00:21:41.228Z" trxn="smaple data with spaces 2 record" trxn="smaple data with spaces 3rd record" trxn="smaple data with spaces 5th record" tid="2345500116294391399" ts="NA" ets="2017-02-22T00:21:41.228Z" trxn="other data with spaces" trxn="another record data" trxn="smaple data with spaces record" trxn="data with spaces" tid="2345500116294391399" ts="NA" ets="2017-02-22T00:21:41.228Z" I tried like below sed -e 's/trxn=/\ntrxn=/g' -e 's/tid=/\ntid=/g' -e 's/ts=/\nts=/g' while IFS= read -r var do if grep -Fxq "$trxn" temp2.txt then awk -F"=" '/tid/{print VAL=$i} /ts/{print VAL=$i} /ets/{print VAL=$i} /trxn/{print VAL=$i} /tid/{print VAL=$i;next}' temp2.txt >> out.txt else awk -F"=" '/tid/{print VAL=$i} /ts/{print VAL=$i} /ets/{print VAL=$i} /tid/{print VAL=$i;next}' temp2.txt >> out.txt fi done < "$input"
BNRINBOX (29 rep)
Apr 15, 2019, 04:45 AM • Last activity: Sep 19, 2019, 01:19 AM
0 votes
0 answers
93 views
Remove words from dictionary which has two or more letters at any plave in wor
Basicaly I have dictionary from which I would like to remove words which contain two or more letters at any place in sentence for ex. bold leTTer reMoveMe quote lanGuaGe spaces Output should be like this bold quote space Is it possible to do it with seed ?
Basicaly I have dictionary from which I would like to remove words which contain two or more letters at any place in sentence for ex. bold leTTer reMoveMe quote lanGuaGe spaces Output should be like this bold quote space Is it possible to do it with seed ?
Slobodan Vidovic (185 rep)
May 26, 2019, 10:13 PM
3 votes
1 answers
8742 views
How to split files with numeric names?
I'm trying to split text file into files of 1024 lines, so I ran `split` with the `-d ` switch: split -d -l 300 ./list.lst I get some weird names: they start with `x` and the file names jump from `x89` to `x9000`. I want the files to be named like that: 1.lst 2.lst 3.lst ... Thanks.
I'm trying to split text file into files of 1024 lines, so I ran split with the -d switch: split -d -l 300 ./list.lst I get some weird names: they start with x and the file names jump from x89 to x9000. I want the files to be named like that: 1.lst 2.lst 3.lst ... Thanks.
Adel M. (358 rep)
Jan 6, 2019, 01:01 AM • Last activity: Jan 6, 2019, 06:04 AM
0 votes
4 answers
4877 views
finding words that contain only 3 characters using sed in a file
I need to print only words that consist of 3 characters, however the word document is a numbered list. Here's the exact question that I have to answer: > Using the `sed` command with `[[:lower:]]` character class on the `animals` file, find all the animal names that are only three characters long _(...
I need to print only words that consist of 3 characters, however the word document is a numbered list. Here's the exact question that I have to answer: > Using the sed command with [[:lower:]] character class on the animals file, find all the animal names that are only three characters long _(3 marks)_. This is what i have tried: cat animals | sed '/{[:lower:]].../d' cat animals | sed '/{[:lower:]]/d' sed '/[[:lower:]]{3}/d' animals This is the file I am trying to find the words from (the animals file): 01. aardvark 02. badger 03. cow 04. dog 05. elephant 06. fox 07. goose 08. horse 09. iguana 10. jackal 11. koala 12. lamb 13. mongoose 14. narwhal 15. onyx 16. pig 17. quail 18. rat 19. snake 20. tiger 21. umbrellabird 22. vulture 23. walrus 24. xerus 25. yak 26. zebra i have just found out the code cannot have the [[:lower:]] in it more than once is there a way to do this??
ItCouldBeMe (53 rep)
Oct 20, 2018, 07:57 PM • Last activity: Oct 24, 2018, 02:36 AM
3 votes
4 answers
2014 views
sort letters in a single word - use it to find permutations (or anagrams)
I have some dictionary to myspell in `file.dic`. Let's say: abc aword bword cab worda wordzzz and I'm looking for different words that are **permutations (or anagrams)** of each other. If there was a command "letter-sort" I'd do it more or less like that: cat file.dic | letter-sort | paste - file.di...
I have some dictionary to myspell in file.dic. Let's say: abc aword bword cab worda wordzzz and I'm looking for different words that are **permutations (or anagrams)** of each other. If there was a command "letter-sort" I'd do it more or less like that: cat file.dic | letter-sort | paste - file.dic | sort That gives me: abc abc abc cab adorw aword adorw worda bdorw bword dorwzzz wordzzz so now I clearly see anagrams in file. Is there such letters-sort command or how to obtain such result in maybe some other way?
sZpak (511 rep)
Nov 17, 2016, 11:27 PM • Last activity: Aug 29, 2017, 12:53 PM
1 votes
1 answers
2009 views
How do I use this regex with grep?
I'm new to regex and found a command on a regex tutorial/test site that will allow me to search for 3 consecutive consonants. The only problem is I can't figure out how to use it with grep. Would someone help me out? I'm trying to search a word list text file using: `(?:([bcdfghjklmnpqrstvwxzy])(?!....
I'm new to regex and found a command on a regex tutorial/test site that will allow me to search for 3 consecutive consonants. The only problem is I can't figure out how to use it with grep. Would someone help me out? I'm trying to search a word list text file using: (?:([bcdfghjklmnpqrstvwxzy])(?!.{1,2}\1)){3}
Austin (231 rep)
Oct 31, 2016, 05:53 AM • Last activity: Oct 31, 2016, 06:03 AM
4 votes
2 answers
881 views
Does command substitution within arithmetic substitution get word split?
I seem to recall from comments on this site that the contents of arithmetic expansion **may** be word split, but I can't find the comment again. Consider the following code: printf '%d\n' "$(($(sed -n '/my regex/{=;q;}' myfile)-1))" If the `sed` command outputs a multi-digit number and `$IFS` contai...
I seem to recall from comments on this site that the contents of arithmetic expansion **may** be word split, but I can't find the comment again. Consider the following code: printf '%d\n' "$(($(sed -n '/my regex/{=;q;}' myfile)-1))" If the sed command outputs a multi-digit number and $IFS contains digits, will the command substitution get word split before the arithmetic occurs? (I've already tested using extra double quotes: printf '%d\n' "$(("$(sed -n '/my regex/{=;q;}' myfile)"-1))" and this doesn't work.) ----- Incidentally the example code above is a reduced-to-simplest-form alteration of [this function](https://stackoverflow.com/a/35882480/5419599) that I just posted on Stack Overflow.
Wildcard (37446 rep)
Mar 9, 2016, 03:52 AM • Last activity: Mar 11, 2016, 01:46 AM
Showing page 1 of 19 total questions