Sample Header Ad - 728x90

How to get unique occurrence of words from a very large file?

-1 votes
1 answer
1066 views
I have been asked to write a word frequency analysis program using the unix/ shell-scripts with the following requirements: - Input is a text file with one word per line - Input words are drawn from the Compact Oxford English Dictionary New Edition - Character encoding is UTF-8 - Input file is 1 Pebibyte (PiB) in length - Output is of the format “ Word occurred N times” I am aware of one of the way to begin with as below --- cat filename | xargs -n1 | sort | uniq -c > newfilename What should be the best optimal way to do that considering performance as well?
Asked by Pratik Barjatiya (23 rep)
Dec 29, 2017, 06:27 AM
Last activity: Apr 15, 2025, 03:40 PM