How to get unique occurrence of words from a very large file?
-1
votes
1
answer
1066
views
I have been asked to write a word frequency analysis program using the
unix/ shell-scripts with the following requirements:
- Input is a text file with one word per line
- Input words are drawn from the Compact Oxford English Dictionary New Edition
- Character encoding is UTF-8
- Input file is 1 Pebibyte (PiB) in length
- Output is of the format “ Word occurred N times”
I am aware of one of the way to begin with as below ---
cat filename | xargs -n1 | sort | uniq -c > newfilename
What should be the best optimal way to do that considering performance as well?
Asked by Pratik Barjatiya
(23 rep)
Dec 29, 2017, 06:27 AM
Last activity: Apr 15, 2025, 03:40 PM
Last activity: Apr 15, 2025, 03:40 PM